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PREFACE 


an 


science of statistics in this country seems to have been subject 
0 main influences. The biometric school associated with Pro- 
r Karl Pearson have developed methods and concepts which 
the basis of the whole subject, and in more recent years the needs 
iological experimentalists^haye been met by developments in 
the! largely to' Dr.'R. A. Fisher. There are many text- 

I^Q,|)ks on what may be regarded as the classical theory of statistics, 
Fisher’s own methods are, described in his book, Statistical 
thods for Research Workers, In the present book I have attempted 
present a single system of statistics, so that a reader with little 
bvious acquaintance may obtain a good working knowledge and 
derstanding of the methods available. ' ' 

The first chapters deal with frequency distributions and constants, 
d with the theory of errors, in orthodox manner, but in the later 
^chapters the underlying theme is Fisher’s jjdea of the Analysis of 
Variance; correlation is introduced as a spetdal case of this. There 
are, of course, other ways of regarding the’ subject, but its unity 
seems to me to be brought out more by this than by any other 
method of presentation. 

In outlining the theory underlying the methods described, I have 
given mathematical proofs where they are easy ; but where they are 
not, I have not hesitated merely to state the results. The necessary 
mathematical attainments of readers are slight and include algebra^ 
up to the binomial theorem and the elementary notions of the 
calculus; even those who have to omit the malhematics entirely 
should be able to arrive at an appreciation of the arguments. Never- 
theless, the theory of statistics is not easy, not so much because it is 
abstruse, as because the ideas are new to most people, and a good 
deal of hard thinking and patient work will be necessary. 

I have drawn largely on the literature for examples, and it will be 
obvious to biologists that I am not one, for some of the statistical 
deductions I make are of no biological interest, although the methods 
applied to other inquiries may be of great value. Statistics is a 
tool, and the good craftsman knows what he wants to make with 
his tool before he lifts it; it is only the small boy who uses it promis- 
cuously. In this book I have often been like the small boy, but if 
readers will be small boys with me, they will learn how to handle 
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the tool, and then will be able to apply it with biological judgi 
There is no better way of learning statistics than by working thr 
examples. , 

I wish to thank Mr. F. D. Farrow, Mr. H. A. Hancock | 
Miss R. Hodgson for their help and criticisms, and Mr. Ljl 
Galloway and Mr. J. Gregory for providing me with unpublisi 
material for examples. f 


August, igsi 


L. H. c. tipp:^ 
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PREFACE TO SECOND EDITION 

1 

For this edition I have revised the book considerably, and in par-^ 
ticular have re-written the first three chapters. There is now more 
mathematical theory tJian before, and although I have not tried to 
prove everything, I have aimed at giving typical proofs to show in 
what way the various distributions and methods arise. These proofs^ 
are in separate sub-sections, and may be omitted if desired. There 
are other additions that call for no special comment. 

Several topics that were dealt with in the first edition are now 
excluded. Tests of significance based on the correlation ratio are 
omitted because they are less convenient than the equivalent ^-tests, 
and the intra-class correlation coefficient because it is now only 
of historical interest. I have not included the theory of ranking 
because (i) it is not often needed, (2) when required it is usually 
for application to small samples whereas the theory given before 
was for large samples, and (3) I am not aware of any convenient 
methods involving ranking that are as well founded as the other 
methods given in this book. 

I wish to acknowledge with thanks the criticisms, suggestions and 
corrections received from many friends and correspondents, and am 
particularly indebted to Dr. A. J. Turner for reading and comment- 
ing on the manuscript of this revised edition. 

L. H. C. T, 

Jfune, igg? 
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PREFACE TO THIRD EDITION 

ThIlY year I become aware of new directions in which I think this 
to ^ could be improved, and developments in the subject of statistics 
fesr make periodical revisions desirable. The changes I would wish 
fori^take at this stage, however, are not sufficiently important to 
of any considerable amount of resetting of the type, particularly 
the;ihere’s a war on,” and so this edition is the same as the second 
bcji^pt for a few minor corrections — ^and one important addition, 
anclie addition is a set of tables, given at the end of the book, of 
normal probability integral, ^ tables based on Fisher’s z 
tch testing the significance of differences between variances. These 
pJiles are a shortened and, in some instances, slightly modified form 
hose in R. A. Fisher’s Statistical Methods far Research Workers^ 
3 are sufficient for many of the experimenter’s everyday require- 
%.ents. I acknowledge with pleasure the generosity of Professor 
isher and his publishers, Messrs. Oliver & Boyd, in allowing me 
3 reproduce these tables. 

^arch, ig4i 

i. i' 


L. H. C. T. 




I 


CONTENTS 


Page 

PREFACE * 7 

INTRODUCTION 15 

Experimental and Statistical Methods Compared 15 

Terminology r6 

Populations 16 

Theory of Errors 17 

CHAPTER I 

FREQUENCY DISTRIBUTIONS AND CONSTANTS 19 

Frequency Distributions 19 

Formation of Frequency Tables 26 

Frequency Constants 27 

Proportionate Frequencies 28 

Position 28 

Variability or Dispersion 30 

Shape 33 

Computation of Moments 34 

Moments from Ungrouped Data 34 

Moments from Grouped Data 38 

» 


CHAPTER II 

DISTRIBUTIONS DERIVED FROM THEORY OF PROBABILITY 43 


Probability 43 

Mathematical Probability 44 

Statistical Probability 44 

Binomial Distribution 46 

Poisson Series 48 

Use of the Binomial and Poisson Distributions for Testing 

Randomness 49 

Normal Distribution 54 

Determination of Normal Frequencies 56 

Relation between Normal Frequencies and Standard 

Deviation 59 

Practical Applicability of the Normal Distribution 60 

Non-normal Curves 61 

Sampling Distributions 62 

Sampling Distribution of Mean * 63 

Deduction of Sampling Distributions of Mean and Standard 

Deviation 64 



■'i 


« 



TZ 


METHODS OF STATISTICS 


CHAPTER III 

ERRORS OF RANDOM SAMPLING AND STATISTICAL 

INFERENCE 


PAGE 



Random and Representative Samples 67 

Tests of Significance 69 

Significance of Difference between Two Sample Means 71 

General Discussion 74 

Groups of Samples 78 

Applicability of Normal Sampling Theory 80 

Tests of Significance Based on Skew Sampling Distri- 
butions 80 

Sampling Errors of Various Constants 82 

Means of Binomial and Poisson Distributions 82 

Measures of Dispersion 84 

Constants of Shape 86 

Standard Errors of Fxmctions of Statistical Constants 87 

Determination of Population Value from Sample 89 

Choice of Statistical Constants 91 

Method of Maximum Likelihood 95 


CHAPTER IV 

GOODNESS OF FIT AND CONTINGENCY TABLES 98 


Sampling Distribution of 

The Additive Nature of 

Contingency Tables 

General Notes on the Distribution of 

CHAPTER V 

SMALL SAMPLES 


98 

102 

lO/T 

lOv 





ll 

i 




no 


Variance Estimated from Small Samples no 

Significance of Means : The t Test 112 

Essential Character of t 114 

Significance of Differences Between Means 114 

Significance of Differences Between Variances 117 

Significance of Variations Between Several Samples 120 

Small Samples from the Binomial and Poisson Distributions 12 1 





FREQUENCY DISTRIBUTION^ AND CONSTANTS 23 



I 



FREQUENCY 


24 


METHODS OF STATISTICS 


□ --200 urns Of FREQUENCY _ 




fd 54 58 6i ^6 70 14 7S 

(!) HEIGHT W INCHES’ 



= 50 UN/TS OF FREQUENCY^ 



■ 










■ 

□ 

- 1 

0 

10 , 20 

SO 


4C 

so :So To 


(III) DISTANCE BETWEEN SPIRAL REVERSALS IN 10'^ WIN 


Fig. I. — FEEQUENCY DIAGBAMS. 



AONjnoiyj 


r 


FREQUENCY DISTRIBUTIONS AND (CONSTANTS 


-/o UNITS oFFmma 


7 H- 2 / 28 . 35 _ 42 _ 45 56 65 70 

(/v) NUMBER OF VICE-COUNTIES 

-100 urns OF FFEOUENCY 


1 I I I I 1 1 1 1 — 

0 / 25456789 10 

( V] DEGFEES OF CLOUDINESS 



^5 


Fig. I. — ^FHEQUENcy diagrams — continued. 



26 


METHODS OF STATISTICS 


contains many individuals ; indeed for the latter it is the most fre- 
quent, showing that most of the 266 species are found only in fewer 
than eight vice-counties. This is the form of the “hollow curve” 
mentioned in Willis’s book, Age and Area, and similar distributions 
are given by the number of petals of some flowers. The distribution 
of degrees of cloudiness is of a very unusual form; it shows that 
for July the sky is usually either nearly completely overcast or 
completely clear, and that comparatively seldom is there a moderate 
amount of cloudiness. Sometimes there are two or more humps or 
f modes as they are called, as in the rays of chrysanthemums, and such 
a distribution usually indicates some mixture of types in the data. 
Care must be taken, of course, not to mistake some small irregu- 
larity, such as often occurs, for a second mode. 

Formation of Frequency Tables 

ITl. The determination of the appropriate size of sub-group of a 
frequency table or the number of classes into which the total range 
is divided is a compromise between giving too much and too little 
detail. If the variate is continuous and the sample is large, it has 
been found convenient to have between ten and twenty classes, but 
if the number of readings is less than 100 fewer groups show up 
better the essential features of the distribution. Care should be 
taken not to make the grouping too fine relative to the unit of 
measurement of the character. In dealing with the age returns of. 
men, for instance, it is usually inadvisable to choose sub-ranges 
of less than a year, for many people give their age last birthday, so 
that the groups representing integral years contain an undu^ number 
of observations, while those representing fractions have compara- 
tively few. If the variate is discrete, the natural unit of grouping 
is the unit of variation, as in Table i.i, (ii) and (vi), unless there 
are too many for the size of the sample and the distribution is 
irregular ; in such instances, several imits may be combined to form 
one group as in the fourth example of Table i.i, where the 71 vice- 
counties have been divided into 10 groups. 

If the variate is continuous, the sub-ranges must also be con- 
tinuous; there must be no gap between the highest value in one 
and the lowest in the next. If this is borne in mind when making 
the measurements, there need be little difficulty in forming the 
distribution, but sometimes the data are collected without reference 
to the fact that they may have to be grouped, and then care is 
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necessary. A common way of recording data is correct to the nearest 
unit. Suppose, for example, the data in Table 1.3 are of a con- 
tinuous variate and are correct to the nearest unit, then the system, 
of classification into groups of three units is unsatisfactory in that 
there is a gap between the sub-ranges; the first nominally stops 
at 29 and the second starts at 30 units — where does an individual 
with an actual value of 29*7 units (say) go? The readings recorded 
as 29 units may be anywhere between 28 * 5 29 ' 5 so the 

sub-range which nominally extends from 27 to 29 units actually 
extends from 26-5 to 29-5; the nominal sub-range 30-32 actually 
extends from 29*5 to 32*5 units, and so on. The actual sub-ranges 

join up. 

Qualitative data may, of course, be classified and formed into a 
frequency distribution in which the sub-groups are the qualities de- 
scribed ; in dealing with hair colour, for instance, red , fair , light brown, 
dark brown, and jet black may be taken as the separate classes. 

Frequency Constants 

1.20 Although frequency diagrams are useful as giving a visual 
impression of the characteristics of a sample, the evidence of the 
eye /alone is not enough, and some numerical measure of the important 
characteristics should be used for more exact description and com- 
parison. The determination of such a measure carries the process 
of condensing the original data a stage further than the formation 
of a frequency distribution, and this can only be done by ignoring 
further features as being irrelevant to the particular investigation. 
Sometimes a special constant may be designed for some particular 
purpose; for example in the kinetic theory of gases, the mean square 
velocity is a sufficient description of the distribution of velocities 
of the molecules for the purpose of establishing energy relationships. 
If the frequency diagram is of unusual shape, e.g. multi-modal, 
special methods of expression may be necessary, but for most 
purposes and for distributions commonly met with, there are 
standard frequency constants that have been devised to express 
various characteristics. These constants will be described here. It 
may be emphasised at the outset, however, that in using constants 
the investigator should always keep the practical problem in view 
and choose methods that are appropriate. There is no virtue in 
calculating statistical constants in a blind, routine marmer, without 
knowing first that they will be of some use. 
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Proportionate Frequencies 

1 . 21 . For many purposes, the investigator is only interested in the 
proportion of individuals having characters below or above a certain 
value, or between two. It can well be imagined that the U.S. 
recruiting official would be interested in knowing the proportion 
of men below (say) 65 inches in height, so that he would know to 
what extent the number of recruits would fall off if he were to make 
that the minimum height allowable, and from Table i.i (i) we see 
that the ratio is 6 825 : 25 878, or about 26-4 per cent. 

It usually (but not always) happens that the size of the sample 
is governed by some arbitrary conditions that have no objective 
significance, so that the distribution is expressed in a more general 
form if the frequencies are given as fractions or percentages of the 
total. 

When a distribution is represented diagrammatically by a histo- 
gram, the proportionate frequency between two limits is the area 
under the diagram between two ordinates drawn at those limits. 

Frequencies and proportionate frequencies imderlie nearly all 
methods of statistical representation. Whatever constants may be 
calculated or however elaborate may be the analysis, the final inter- 
pretation is in terms of frequencies. If the data are sufficiently 
numerous and the problem is simple enough for an answer in terms 
of frequencies to be obtained directly, there is no need to compute 
further constants. 

Position 

1 . 22 . The location or position of a frequency distribution is some 
single measure describing a value of the variate about which the 
observations are scattered. For example, it may be the bull’s-eye 
aimed at by our marksman. There are several constants for describing . 

this. 

\ The most common constant is the mean or average y whidr is the 
sum of all the observations divided by the number.* It is well 
known and understood, but while admitting the great usefulness of 
the mean— it is perhaps the most useful constant of all— we would 
remind readers that it only measures one characteristic of a distribu- 
tion, and that there are oAers of great importance. 

• This is the arithmetic mean; ihe geometric and harmonic means are 
not so much used in statistics. 
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tAn alternative measure of position is the median. If all the 
observations in a sample are ranked in order of value, the median 
is the value of the middle one if the total number is odd, or a value 
between the two middle ones if the total number is even. The 
number of individuals greater than the median value is equal to 
the number that are smaller, and an ordinate drawn at that value 
divides a histogram into two equal areas. The median is troublesome 
to find if the sample is large and the grouping broad, for all the 
individuals in the group containing the median have to be ranked in 
order. In the third example of Table i.i there are i 117 observations, 
so the median is the value of the 559th placed in order, and lies 
somewhere between 16-5 and 20-5 scale divisions. In that group 
of 94 individuals the median is the 6th in order of magnitude, and 
we may take it as being approximately 16-5 + 9^ X 4 = 16' 76 
divisions. 

The mode is that value of the variate about which the observations 
are most concentrated, that is the value at which the ordinate of the 
frequency diagram is highest, and in Table i.i (i), for instance, it is 
between 66 and 67 inches. It is not always easy to define accurately 
in a sample, for it may be anywhere in the most frequent group, 
or if the distribution is very flat at the top, and irregular, it is not 
even easy to decide which would probably be the modal group in 
the whole population ; moreover, a distribution may have more than 
one real mode, as has already been illustrated. Usually, however, if 
there is a single mode, the position can be found by assuming 
Pearson’s system of frequency curves as shown in section 1.24. 

When the distribution is symmetrical, the mean, median and mode 
always coincide, and in all other single-modal cases the median comes 
between the mean and the mode, the dispersion between these three 
constants being a measure of the extent of asymmetry of the distri- 
bution. For practical purposes, the following formula holds approxi- 
mately if the asymmetry is only moderate : 

Mean — Mode = 3 {Mean — Median)^ 

Of these constants, the mean is the most fundamental from a 
theoretical point of view and is the only one that can be used in 
further analysis of the data. For the mere representation of the 
COTtral tendency of a distribution, the median is sometimes recom- 
mended because it is said to be least affected by extreme individuals, 
but for most symmetrical uni-modal distributions, the mean is the 
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most stable constant and is least affected by idiosyncrasies of the 
particular sample. This result is derived from the theory of errors. 
When the variate is the life of the individuals under some test of 
endurance, e.g. the life of electric lamps when burnt at a standard 
voltage, the median may be an economical constant to determine; 
for when half the individuals have failed, the median value may be 
determined without any further testing. Since the modal is the most 
tj^ical value, it may be the most suitable single constant to use 
when the distribution is very skew. 

VariahiUty or Dispersion 

1 , 23 . There are several constants devised to measure the degree of 
variation or dispersion of the variate, which, in our analogy of 
section i.i, is the quality that distinguishes the marksmen. The 
most fundamental of these are the second momenty or variancey and 
its square root, called the standard deviation. The second moment is 
the mean squared deviation from the mean, and may be written 

S{x — xy 
^ 2 = N ’ 

where /xg is the second moment, x the successive values of the variate 
in the sample, x the mean, N the number of individuals, and S the 
summation for all values of x. The standard deviation is 

and will usually be denoted by the symbol ^ or ct. When the standard 
deviation is expressed as a percentage of the mean, it is called the 
coefficient of variation. The computation of some of these constants 
will be illustrated at the end of the chapter. 

Another measure, the mean demotion, is often used. It is the sum of 
the deviations from the mean, irrespective of sign, divided by the 
number of observations. For frequency distributions of the ‘‘normal” 
form (which will be described in section 2.5), the standard deviation 
is I -253 times the mean deviation. This is the basis of Peter’s method 
of estimating the standard deviation. 

The qtuartile demotion is the deviation measured on either side of 
the mean, so chosen that as many observations lie beyond the quartile 
value as lie between it and the median, and ordinates drawn at the 
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median and at the two quartiles divide the area under a histogram 
into four equal parts. If the distribution is asymmetrical, the upper 
and lower quartile deviations are not equal. The average of the two 
is half the difference between the two quartile values and is called 
the semi-interquartile distance. In Table i.i (iii), if all the observa- 
tions were placed in order, we should have : 


279 obs. 


first 

quartile 


279 obs. 


median 


279 obs. 


third 

quartile 


279 obs. 


The first quartile lies between 8*5 and 10*5 scale divisions, and 
the third between 28*5 and 32-5 divisions, and the quartile devia- 
tions are the distances of these values from the mean. Historically, 


TABLE 1.2 


Number In Sample 

Mean Range 

Standard Deviation 

Z 

I • 128 

5 

2*326 

10 

3-078 

50 

4-498 

100 

5-015 

500 

6-073 


this was one of the earliest measures of dispersion used, but there 
is no reason why the deviation of the ordinates dividing the frequency 
digram in any other proportions should not be used and, indeed, 
K. Pearson has shown (1920) that the deviation of the ordinate which 
cuts off i^th of the area determines the dispersion most accurately 
for the ‘‘normal” distribution. 

At first sight the range, which is the difference between the greatest 
and smallest members of a sample, would appear to be the most 
natural index of dispersion ; but it unfortunately varies for samples 
of different sizes taken from the same population, being smaller 
for the smaller samples. When the population form and sample 
size are constant, however, the mean range is proportional to the 
standard deviation, and if their ratio is known the latter can be 
found from the former. For the “normal” form of distribution, 
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full tables are given in Biometrika, XVII (1925), p. 386, of the 
ratio 

Mean Range 
Standard Demotion 

for samples of between 2 and i 000 individuals, and an abstract of 
them is given in Table 1.2 ; from this we may see how much the range 
depends on the size of the sample. Of course, single values of the 
range are liable to rather large chance errors, but if it is convenient 


TABLE 1.3 


54 

55 

41 

70 

42 

50 

53 

31 

49 

47 

49 

59 

51 

56 

40 

52 

52 

5b 

41 

28 

50 

49 

57 

57 

44 

53 

72 

5b 

53 

60 

53 

54 

32 

47 

41 

43 

56 

59 

53 

47 

70 

62 

56 

44 

54 

49 

50 

35 

53 

38 

Means . 55*2 

00 

• 

10 

47-4 

54-8 

44*2 

49*4 

56*6 

47*4 

00 

• 

44*0 

Ranges . 21 

13 

25 

26 

14 

10 

22 

28 

12 

32 

54 

27 

39 

52 

49 

57 

' 28 

48 

3b 

48 

27 

46 

58 

65 

49 

38 

51 

42 

48 

47 

48 

60 

66 

60 

44 

55 

49 

46 

55 

42 

41 

62 

59 

41 

49 

56 

50 

62 

53 

69 

47 

48 

63 

62 

64 

47 

53 

59 

47 

4b 

Means . 43*4 

48*6 

S 7 ‘o 

56-0 

51*0 

50*6 

46-2 

51*4 

47-8 

50*4 

Ranges . 27 

35 

27 

24 

20 

19 

25 

20 

19 

27 


to collect the data in groups of 5 or 10 (say), the mean range can be 
found quite quickly, and when divided by the appropriate constant 
obtained from the tables, can be converted to the standard deviation. 
The ratio does not seem to be very sensitive to moderate changes 
in the form of the distribution, and this method may be used in most 
practical experience when the frequency distribution is uni-modal 
and tails off fairly gradually to zero at the extremes. Because of its 
ease of computation, the range is convenient in routine work. In 
Table 1.3 there are 100 individuals from an artificially constructed 
“normal’^ population, and these are arranged in random groups of 5. 
The ranges are given, and their mean is 22 • 3 . The constant given in 
Table 1.2 for samples of 5 is 2 ’32^6, whence we obtain for an estimate 
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of the standard deviation a value 22-3/^*326 = 9*6; this is not far 
from the true standard deviation, which happens to be 10 units. 

This method of obtaining the standard deviation is only equivalent 
to the other method when the individuals are mixed and divided 
into sub -samples at random. 

All these measures of dispersion may be rather bewildering to the 
reader, but they are simply so many alternatives, which are exactly 
related for any given form of frequency distribution. In view of the 
theory of the next chapter and section 3.7, it will be well to regard 
the standard deviation as the fundamental measure of scatter, and 
the other constants as approximations to it. It is sometimes thought 
that these measures of variability apply only to distributions of the 
“normar* t3rpe, but this is not so; they are equally applicable to 
skew distributions, and, provided the shape is the same, may be 
used to compare one distribution with another. 

A qualitative appreciation of dispersion or variation may be 
acquired with little experience, and for comparative purposes any 
of the above measures may be appreciated fairly easily; the more 
precise interpretation of the standard deviation in terms of frequencies 
is dealt with in section 2.52. 

it 

Shape 

1 . 24 . The shape of a uni-modal frequency distribution may vary 
in two ways, in the degree of asymmetry, or in the flatness of the 
mode. This flatness of the mode (or kurtosis) is different from that 
flatness of the curve as a whole which arises from the dispersion, 
and is illustrated in Fig. 2, where there are several curves having 
the same standard deviation but varying kurtosis. These properties 
may be measured by constants derived from the third and fourth 
moments of the distribution. The third moment, 

S{x — xY 

and the fourth moment, 

S(x - xY 

where the s3mibols on the right-hand side of the equations have the 
same meaning as those used in defining the second moment. K. Pear- 
son has derived two constants from the moments, which are inde- 
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pendent of the dispersion of the distribution, and describe its shape. 
They are 



and ^2 == 




j3i is zero if the distribution is symmetrical, while jSg is usually about 
3 for a curve like'that of Fig. i.i (i), is smaller if the mode is flatter, 
and larger if the curve is more sharply peaked. In Fig. z are given 
a few smooth frequency curves with different values of and jSg. 
They all have positive skewness, and a similar series with negative 
skewness can be imagined. 

The degree of skewness may be more simply measured by the 
quantity 


Skewness = 


Mean — Mode 
Standard Deviation 


This quantity is negative if the curve shows a bias to the right 
(assuming that ascending values of the variate are taken from left 
to right) and has the longer tail to the left. The position of the 
mode, as we have shown, is not easy to define, but if the curve comes 
within Pearson’s* system, the following formula may be used : 


Skewness = 


+ 3 ) 

2(Si32 - 6i3i - 9) 


in which is given the same sign as From this we obtain the 
equation for fixing the mode of a distribution relative to its mean, 


Mean — Mode — 


+ 3 ) 

2(5^2 — 6^1 — 9) 


These constants of shape are rather difficult to interpret, and it is 
only after considerable experience that their practical import can 
be appreciated. 


Computation of Moments 
Moments from Ungrouped Data 

1,31. It is always possible to compute the mean and the higher 
moments in a sample by straightforward evaluation of the expres- 
sions given in sections 1.2Z-1.24 as definitions. For example, the 

^ This will be mentioned in section 2.6. 
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mean and second moment are calculated directly in the first three 
columns of Table 1.4 for a small sample of ten individuals. We have 





not yet dealt with the theory of methods for such small samples,' 
and this one is given here merely to illustrate arithmetical processes 
that are applicable to small and large samples. The sums of the 
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columns are given at the foot of the table, and from them it is seen 
that the mean, == 71-8 and the second moment ju-g = iz-g 6 . 

Such calculations are much facilitated by making a transformation 


of the values of the variate, measuring them as deviations from 
some arbitrary origin or ‘ Vorking mean” and dividing the deviations 
by an arbitrary constant ; this last step is equivalent to representing 

TABLE 1.4 

X 

(x~x) 


X' 

Xf2 

67 

- 4-8 

23*04 

— I 

I 

70 

- 1*8 

3*24 

0 

0 

76 

+ 4'3 

17*64 

+ 2 

4 

73 

+ 1*2 

1*44 

+ I 

I 

67 

- 4-8 

23*04 

— I 

I 

73 

+ 1*2 

1*44 

+ I 

I 

70 

- 1*8 

3*24 

0 

0 

73 

+ 1*2 

1*44 

+ I 

I 

70 

- 1*8 

3-24 

0 

0 

79 

+ 7'2 

51-84 


9 

718 

0*0 

129 * 60 

+ 6 

18 


the variation on an arbitrary scale. If x' is the transformed variate, 
X the arbitrary origin, and h the arbitrary constant or scale, 

= and x=^X+hx' . . . (i.i) 


Further, let the mean of the values of x' be x\ and the mean of 
their sth power be ju,', so that 



* The letter fi means that the constant is the mean of some power of 
deviations, the subscript 5 denotes the order of the power (e.g. for the 
second moment, 5 = 2) and the prime indicates that the deviations are from 
an arbitrary origin in arbitrary units. Consistent with this notation ac' *= fjL[. 
Where there is no prime, the deviations are measured from the mean in the 
units of the variate, and the is the sth moment. 
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Then it may be shown that : 

X = -Y hx , 

H 

[Ms = 

To illustrate the use of these equations the mean and second moment 
have been calculated for the data of Table 1.4. The arbitrary origin 
X — 70, the constant A = 3, and the values of x' are given in the 
fourth column. From their sum, Jc" = o*6 and 

^ = 70 + 3 X 0*6 = 71*8. 

* The sum of the values of given in the fifth column, gives 
^ 2 = 1*8, and from equation (1.2), 

^2 = 9(1-8 — 0*36) = i2*96. 

These results agree with those calculated directly, as indeed they 
should. Readers are recommended to calculate the third and fourth 
moments for Table 1.4 by the two methods. 

1 . 311 . The proof of equations (1.2) depends on applying the 
ordinary rules of addition to summations denoted by the sign S* 
The first is that the summation of a compound term itself made 
up of the sum of several terms is equal to the sum of the summations 
of the separate terms. For example, 

The second rule is that the summation of a term that is 
multiplied by a factor constant for all values of the term summed, 
is equal to that constant multiplied by the summation of the term. 
For example, 

Shx' = hSx'. 



The general significance of these rules should be appreciated. 

From equation (i.i) and the above examples of the summation 

* Readers will do well to master these rules for the sake of following later 
chapters. 
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rules the first part of equation (1.3) follows easily. It also follows 
that 


and 


X ^ X = h(x' — x')y 





Taking the constant term outside the S sign and expanding the 
term in brackets by the binomial theorem, we find that 



Summing this term by term, and remembering that x' = and 
is constant for all individuals in the sample, we combine the last 
two terms, and find that 


ftj = - p + •••(— ly 

If j is successively put equal to 2, 3 and 4, the last three of equa- 
tions (1.2) result. 


Moments from Grouped Data 

1 . 32 . When calculating the moments of a frequency distribution, 
it is usual to assume that all the individuals in any one group have 
the central value of that group, i.e. the value midway between the 
limits of the sub-range. For Table i.i (i) the central values are 
51 '5) 52 '5, etc., inches. These values may be transformed as shown 
in section 1,31. Then to calculate the moments, instead of summing 
over all individuals, it is possible to multiply the appropriate power 
of the transformed group value by the frequency and to sum these 
products over all groups. Let oc^ be the fth group value (the subscript 
denotes that the transformed variate can only take discrete values 
corresponding to the central values of the groups), the frequency 
in that group, and S be the summation over all groups, then for 
such data, ^ 
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The letter v' is used for the mean of a power of a transformed 
variate calculated from grouped data where p' would be used if the 
data were ungrouped. 

For example, the data in Table 1.4 may be grouped and the sum 
of the fourth column is 


2X(“-i) + 3 Xo + 3Xi + iX2+iX3. 

Using equations (1.3) and applying equations (1.2) to the resulting 
means as shown above, it is possible to compute the crude mean 
and moments from grouped data. These moments are denoted by 
Vg, Vq and V4. For a continuous variate some of the crude moments v 
differ slightly from the true moments denoted by even when 
calculated from very large (infinite) samples, because the discon- 
tinuous form of a frequency distribution with finite groups is only 
an approximation to the continuous form that is possible with a 
continuous variate. Sheppard’s corrections, when applied to such 
crude moments calculated from a table with uniform sub-ranges, 
give values that approximate more closely to the true moments. 
The corrected mean and moments so obtained are : 


and 


X == crude meariy 

H2 = V2 — 

/^3 == >' 3 . 
i “4 = *'4 — 



The above transformations make the computation of moments 
from grouped data a comparatively easy matter if the arbitrary 
origin X is chosen to be the central value of one of the groups 
near the centre of the whole distribution and h is made equal to 
the sub-ranges, so that the values of the transformed variate xl are 
o, I, 2, 3 ... — I, — 2, — 3 .. . etc. The corrections (1.4) can 
then be applied to the arbitrary moments, putting h equal to i, and 
the final true moments be found by multiplying by A, h? and 
the jS coefficients, being ratios, are found directly. 

The whole process is followed in the example of Table 1.5, the 
data being the heights of i 078 fathers (Pearson and Lee, 1903). 
Column (i) contains the sub-ranges, and column (2) the values of 
the centres of the sub-ranges in the arbitrary units, xl. Column (3) 
contains the frequencies (the 0-5 frequencies arise because when 
an observation falls exactly on the border-line between two groups, 
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TABLE 1.5 


a) 

m 

(8) 

(4) 

(6) 

(«) 

Stature Ixi 
Inches 

Arbitrary 

Units 

4 

Frequency 

fit 

n 0 't 

ntxt ^ 

ntpd ^ 

58-5- 

- 9 

3 

- 27 

243 

— 2 187 

59 * 5 - 

- 8 

3*5 

— 28 

224 

- 1 792 

60*5- 

“ 7 

S 

- 56 

392 

2 744 

61-5- 

- 6 

i 7 

— 102 

612 

— 3 672 

62*5- 

- 5 

33-5 

— 167-5 

837-5 

- 4 187-5 

63-5- 

“ 4 

61 -s 

— 246 

984 

— 3 936 

64-5- 

- 3 

9 S-S 

— 286*5 

859-5 

- 2 578-5 

6s s- 

— 2 

142 

— 284 

S68 

— 1 136 

66*5- 

— I 

137-5 

- 137*5 

137-5 

137-5 

67*5- 

0 

154 

' — 

— 

— 

68*5- 

I 

X 4 i *5 

141*5 

141*5 

I 4 X -5 

69-5- 

2 

116 

232 

464 

928 

70-5- 

3 

78 

234 

702 

2 106 

71 - 5 - 

4 

49 

196 

784 

3 136 

72-5- 

5 

28*5 

142-5 

712-5 

3 562-5 

73 * 5 - 

6 

4 

24 

144 

864 

74-5- 

7 

5-5 

38-5 

269-5 

I 886*5 

Total 

■ — 

I 078 

— 326 

8075 

- 9 746 


326/1 078 = — 0*302 41 

i/' = 8 075/1 078 = 7*490 72 

— j/'2 = — 0*591 45 


7*39927 

— 0*083 33 


/^ 2 = 7*31594 

O’ s= 2 * 704 8 inches 

>'3= - 9 746/1 078 = - 9-040 8 
— 3»'2»'i = + 6-795 8 


2*245 o 
+ 2v(® = — 0*055 3 


= = — 2*3003 

j3i=(-.)o*oi3 513 
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' TABLE 


■continued 


(7) 

(8) 

(9) 

(10) 

(11) 

(12) 

ntx^^ 

Xt — X 

w~ 

o' 

A 


nt 

(expected) 

z 


(13) 


y—z 


19 683 
14336 

19 208 
22 032 
30 937 *5 
15 744 
773S-5 
2 272 

137-5 

141-5 

I 856 
6 318 

13 544 
17812-5 
5 184 
13205-5 


179 147 


—3-030 8 

0*001 22 

1*3 

1*3 

^ * n 

0 * 004 0 

1*6 

— 3*66i 0 

0*003 90 

4*2 

3 9 
7*6 

17*7 

35*5 
62 * 8 

0*011 6 

4*6 

— 2 ’ 29 I 3 

0*010 97 

11*8 

0 * 028 9 

11*5 

— 1-931 6 

0*037 33 

29’5 

0*063 0 

25*1 

—1-551 9 

0*060 34 

65*0 

0-1197 

47-7 

— 1*183.2 

o*ii8 56 

127*8 

96-7 

0-X98 3 

79-0 

— o*8i2 s 

0 *208 35 

234*5 

0*286 8 

il 4'3 

—0*443 8 

0*328 96 

354-6 

130 I 

153-0 

157*1 

141 -o 
110*5 
75'7 
45*2 
23*7 

1 T* 

0*361 7 

144*2 

—0*073 I 

0*470 86 

507-6 

0-3979 

158*6 

6*396 7 

o*6i6 65 

664*7 

0*381 8 

152*2 

0*666 4 

0-74742 

805-7 

0-319 5 

127*3 

1*036 I 

0-84992 

916*2 

0*233 2 

92-9 

1*405 8 

0*920 10 

991*9 

0*148 5 

59*2 

1-775 5 

0*963 09 

1 037-1 

0*082 5 

32-9 

3 * 14s 3 

0-98403 

X 060 • 8 

0*040 0 

15‘9 

3*5149 

0 • 994 04 

I 071*6 

0*016 9 

6*7 

2*8846 

0*998 04 

1 075-9 

J 6*4 

o*oo6 2 

2*5 

— 

— 

— 

X 078*0 

— 

— 


, V4 = 179 147/1 078 = 166-185 

— = — 10-936 




155*249 

+ 6 vavi® = 

1 

+ 

4 * 1 X 0 

_ 3 vi* = 


159*359 

— - 

0*025 

V4 = 


159*334 

II 

1 



3*700 



155-634 



0*029 


155-663 

= 2-9083 

mean — mode = — 0-170 i 

mean = 68-0 — 0-302 4 = 67-697 6 inches 

mode = 67 • 867 7 inches 
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a half is put into each), and columns (4), (5), (6) and (7) are obtained 
successively from the previous one by multiplying by the corre- 
sponding units in column (2). In column (4) 27 = 3 X — 9, in 

column (5) 243 ~ 27 X — 9, in column (6) — 2 187 = 243 X 9 

and in column (7) 19 683 = ~2i87X — 9, and so on for the other 
rows. The arithmetic may be checked by multiplying each frequency 
directly by and so checking column (7) term by term; if that 
is correct the other columns from which that was obtained must 
almost certainly be right. The sums of these columns (paying regard 
to sign) give the v coefficients, and these are corrected step by step 
below the table. The resulting moments are in inch-units already, 
since the sub-groups are i inch wide. The mean — mode is found 
from the formula on p. 34; — 0-3024, and so the mean is 

0-3024 inch less than the arbitrary origin, which is 68 -o inches 
(the centre of the group at which xl = o). Hence mean height 
== 68*0 — 0-302 4 = 67-697 6 inches. We shall refer to columns 
(8) to (13) of Table 1.5 in the next chapter. - 
In this example, the constants have been calculated correct to 
several decimal places. This has been done advisedly, for when 
complicated computations are performed, errors due to “dropping 
figures” too soon are apt to accumulate and become very large, and 
it is always well to be on the safe side. The number of figures used 
in computing this example is near the minimum advisable. The 
final result may be “rounded off” to a few figures, if it is not going 
to be used in further calculations. 



'CHAPTER II 


DISTRIBUTIONS DERIVED FROM THEORY OF 

PROBABILITY 


Probability 

2.1. There are events about which our knowledge is so complete 
that we are able to predict with complete certainty whether or not 
they will occur, and on the other hand there are events about which 
we know nothing, so that we are unable to make any prediction. 
An event of the first kind is the falling of a penny tossed into the 
air — ^we are certain it will fall — and one of the second kind is the 
falling of the penny with the head (say) uppermost. Between these 
two extremes there are events about which we know something, but 
not enough to allow of certain prediction; these are the province 
of the theory of probability. The extent to which we can rely on a 
prediction varies in degree for different events depending on the 
amount of knowledge we have of the factors that determine the event, 
and a measure of the degree is called the probability of the event. 

Probability is expressed on a somewhat arbitrary scale of numbers 
between unity and zero, a value of i or o corresponding to certainty 
that the event will or will not occur, and intermediate values to 
intermediate degrees of certainty; a probability of 0*5 means that 
the event is as likely to occur as not. Defined in this vague way, 
there seems to be little objectivity about probability as a measure, 
but precision is achieved by giving the word a special meaning and 
concreteness by applying it to a restricted field. These two steps 
result in mathematical and statistical views of probability. By 
extending the conceptions of these special fields to the events of 
ordinary life, the probability of any event is usually expressed as 
chances for or against the event, and the analogy makes the idea of 
probability fairly definite and concrete. We shall see how this arises 
in the next two sections. 

* A general calculus of probabilities has been developed for application 
to the general problem of expressing the relations between events and the 
amount of knowledge of the determining factors, or between propositions 
and the data on which they are based. The application of this calculus has 
not received universal acceptance, but this need not concern us, as we are 
only interested here in the mathematical and statistical applications that are 
almost, if not quite, universally accepted. 
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Mathematical Probability 

2.11. This view of probability is based on a conception of a number 
of equally likely chances, some of which are favourable to the 
occurrence of the event and some of which are unfavourable. The 
ratio of the favourable to the total chances is the probability of the 
event. This leads to a similar scale to that mentioned above, varying 
between zero and unity. It is usual and helpful to imagine an 
experiment like that of throwing a perfect six-sided die, all sides of 
which are equally likely to turn up. Only one side has a six, so the 
probability of turning up a six is Other idealised games of chance 
suggest other typical experiments: — drawing blindfold from a bag 
of well-mixed balls that differ only in colour, or drawing cards from 
a well-shuffled pack, or spinning a perfectly balanced roulette wheel. 

The calculus of mathematical probabilities is purely concerned 
with finding the probability of composite events knowing those of 
the simple components, and as such it is an exercise in permutations 
and combinations, enumerating the chances for and against the 
composite event. There are two fundamental rules that should be 
imderstood. 

Rule /. — The probability of occurrence of one or other of a 
number of independent events, only one of which can occur at a time, 
is the sum of the probabilities of the separate events. For example, 
it is easy to see that when throwing a die, two of the six equally 
likely chances favour the turning up of either a five or a six and 
that the probability is J ^ 

Rule^II , — The probability of the simultaneous occurrence of a 
number of independent ^Ytnts is the product of their separate proba- 
bilities. Thus, when throwing two dice, there are 36 equally likely 
possibilities, and of these, only one is favourable to a double six; 
the probability of a double six is and this is X 

The word independent has its ordinary English meaning in the 
above rules. 

Sometimes the chances for and against an event are specified 
instead of the probability ; a probability of ml{m + n) may be described 
by the statement “the chances are mton for the event.” 

Statistical Probability 

2.12. The mathematical laws of probability find much use in the 
theoi*}" of statistics for calculating the relations between samples and 



occurrence of an event be 
given character, the set o 
lation of equally likely 
events becomes a sampl 
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populations. Superficially ai^^ statistical' standpoint the 

drawing of a sample of men from the (population of 

England is very like drawin|^g ^ 

that t e tec mque o samp # ^ made to satisfy the essential 
conditions for the apphcatirf probability. Then, the 

omes the drawing of an individual of a 
equally likely chances becomes a popu- 
dividuals, and a given combination of 
^ „ - , . ^ of given composition. Samples which 

o ow e aws 0 ^^^ematical probability are called random' 
samp eSy t ey 7 ^ essential condition that every individual 
must e in j|;g character must not be influenced in 

any way by the chamcten individual in the sample. 

•j ^1^ ^ ^ Statistic^ view, the probability of drawing an indi- 

VI ua 0 a given c ara^^^ proportion of individuals in the 

popu ation aving ^^M^haracter; statistical probability is a propor- 
tionate requency. ^^ersely, any probability statement may be 
mterprete in t ose tejT When we say that the probability of a 
given horse wirming d jg one-fifth, we may imagine a 

very large num er or infinite population of races run under the 

--fifth of which races the given horse wins. 

^ This interpretation probability is a Httle more concrete and so 

IS a little more satis yirL practically than its most general description 

as a ratio expressing^ a j^greg of confidence or knowledge, or a ratio 

of chances, but it is quite satisfactory. A population is 

rarely known, even when ^ ^ ^ 

corn that is being samp e^^ j£ Joiown, there would be no 

question of samp ing. ^owerer, it has been found in statistical 
experience that smal from the same bulk vary among 

^emselves considerab increase in size they 

become stable # j subject to sporadic variations. It 

is assumed mat this continues indefinitely as the size of 

the sample increase^ £1^^^ there is a single limiting form to which 

a from a giyg^ population tends as the sample increases in 

size. This limitmg *g the statistician conceives of as the 

infinite populaUon, ^ ^ more ordinary language may be described 
^^the result ^t w#^j^ obtained in the “long-run” of experience. 

t a given kind of individual is the proportion of 


The probability 
* We are only 


ealing with instances in which the population is very 


large compared witl^ samnles that are likely tb be drawn. 
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individuals of ^ that kind in this infinli!^^ population. We do expect, 
in the lang-ruii, that events having givej^ probabilities will occur and 
fail in nearly tne relative proportions by the probabilities. 

The above postulate of the stability proportionate frequencies 
in infinite samples cannot be proved, owing to the limitations 
of human patience and powers it is impi^^^^^^^ increase the size 
of an actual sample indefinitely, but on it been based the whole 
theory of statistical sampling, the extensiv'^ which has failed 

to reveal any inconsistencies that throw dou‘^^ basic postulate. 

This will be .discussed again in section 2 . 4 . " 

The infinite population is seen to be dis*^^^^?* bulk that 

is being sampled, and is an abstraction in physical existence 

cannot be shown ; it is dependent on the sannpling 

as well as on the bulk being sampled. Since' interest lies in the 
characteristics of the bulk, it is important arrange the 

sampling technique so that the infinite bulk 

sampled are substantially the same. ^ 


^•13. The foregoing discussion of probability summed up 

roughly in the following terms. It is assume that in the long-run 
of experience, the proportions of occurrencei® different kinds of 
chance events will tend to have stable values defoe the infinite 

population: The long-run proportion of any 0 ^^ kind of event is 
its probability- For random sampling, the p^or abilities of composite 
events may be calculated from those of siir'f^^^ events by using the 
laws of mathematical probability. In pracT^?^^’ probability state- 
ments are interpreted in terms of propo.^^^^^^^ frequencies in the 
long-run of experience, even when used\^^ measure a degree of 
subjective confidence that the event will Thus, an event 

that happens freq[^uently in the long-run hasV bigh probability, and 
on any single occasion we have a considerab^ degree of confidence 
that an event with a high probability will hapf 

We shall now introduce some theoretical frek^^^^^ distributions 
that are derived from the laws of mathematic probability, and 
describe some of the applications to statistical 


Binomial Distribution ’ ' 

2.2. If we had four perfect dice and threw therq^^^^b^^> noting 
the number of sixes that turned up, we she diflFerenf 
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throws, four, three, two, one and zero sixes, and if we made enough 
throws we could form a frequency distribution in which the variate 
was the number of sixes per throw, or alternatively the number of 
'^not-sixes,’^ and the individuals were throws. This is the distribu- 
tion dealt with in this section. 

The imaginary experiment may be generalised by calling the casting 
of a six the occurrence of an event or a success^ the casting of a number 
other than a six a non-occurrence or failure^ and the throw of four 


TABLE 2.1 


Number of 
Occurrences per 
Set 

Number of 
Non-occurrences 
per Set 

Proportion of Sets 

3 

n 

0 


n — I 

I 


n — 2 

2 

2I ^ ^ 


3 

n(n-i)(n-2) 

3! ^ ^ 

I 



0 

n 


Total 

t 

• • • < 

« ■ « * ■ ■ I 


dice a set of n trials (tz = 4). Then if the probability of a success is 
p and that of a failure is 5, -h ? = i), in a set of n independent 
trials the probability of {n — s) successes and s failures is 

^ . . . (»,.) 

* This follows from the rules of mathematical probability given in 
Section 2.1 1. The probability that (w — s) particular trials Will be successes 
and s will be failures is (Rnle II). There are 

n{n ~ 1) . . . (n — s + i) 

ways in which (n — s) successes and s failures may occur in a sample 
of n (this is a result from the theory of combinations), so from Rule I, the 
probability that cmy {n — s) trials will be successes and any s will be 
failures is the expression (2.1). 
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The probabilities for different values of 5 are set out in Table 2.1 
in the form of a frequency distribution with proportionate fre- 
quencies, This is called the binomial frequency distribution because 
the proportionate frequencies are the terms in the expansion of the 
binomial {p + For example, for the four perfect dice the dis- 
tribution of sixes is the expansion of the binomial + - 1 )^ and is 
shown in Table 2.2. The variate of this distribution is discrete, and 
it will be noted that all the proportionate frequencies may be 
described in terms of the two constants p (or q) and n. 

TABLE 2.2 


Number of Sixes . . 
Proportion of Throws 
Percentage of Throws 


432 

1 20 1 n 0 

T^g-ir 

o-o8 1*54 11-57 


I o Total 

r> 0 o' 02/5^ T 

Ti2 1) 1 2 9 (T I 

38*58 48*23 100*00 


It is a matter of algebra to calculate the moments of the general 
distribution given in Table 2.1 ; they are : 

Mean Number of Occurrences^ I = np^ 


I X 


I 

n 


Second Moment^ = npq ■ 
whence 

Standard Deviation, a = 's/ npq = 
Third Moment, = npq{q — 


1 

(? 


I 
n 

P? 


(2.2) 


npq 

j ofn 2) 

Fourth Moment, ilm = npc^i + 3(n — z)pq\, 02 = — + ^ 

It will be noted that the second and fourth moments are S3mi- 
metrical in^ and q, i.e. the result is the same whether the occurrences 

or failures are the variate, and that for the third moment the inter- 

\ 

change oip and q merely alters the sign. 


Poisson Series 

2 , 3 . A binomial distribution may be imagined in which the proba- 
bility of a failure, q, is very small, that of a success, p, is nearly 
equal to unity, and the number of trials per set, n, is exceedingly 
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large, so that the mean number of failures per set, m = nq^ is of 
moderate dimensions. Then for any moderate value, 5 is negligible 
compared with n and (n — s) may be equated to n. Doing this in 
expression (2.1) the probability of s failures is 



and the limit of this as n approaches infinity is 




y! 



where e is the exponential base = 2-718 . . . 

The expansion of this for different values of 5 is the Poisson Limit 
to the Binomial or the Poisson Series or the Law of Small Numbers. 

This distribution is defined entirely by the one constant /tz, which 
is the mean. The other moments are given by the limits to equa- 
tions (2.2) as n approaches infinity, and of these it is only important 
to note that the second moment, which maybe written (i — 
equals the mean. 

To calculate the terms of the series for a given value of m, either 
the expression (2.3) may be evaluated, with the aid of logarithms, 
for ^ = 0, 5=1, ^ = 2 and so on, or Soper’s tables in Pearson’s 
collection of tables (1931) may be used. These tables give the values 
of expression (2.3) to six decimal places for values of m between 
m = o-i and m= 15*0, and within this range, for all values of s 
that have any proportionate frequency. 


Use of the Bikomial and Poisson Distributions for Testing 
Randomness 

2 . 4 . One of the most elementary and direct experimental tests of 
the assumption of an infinite population and of the applicability 
of the laws of mathematical probability to actual experience consists 
in seeing if the binomial distribution describes the variations in the 
numbers of successes in sets of trials. Dice have been thrown by 
different experimenters many thousands of times in the aggregate, 
and the resulting frequency distributions have been compared with 
the corresponding theoretical binomial distributions. Other experi- 
ments have taken the form of tossing coins, and thousands of runs 
of roulette wheels in casinos have been observed for this purpose. 

D 
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SO 

It cannot be said, however, that these experiments have either proved 
or disproved the laws of probability, for when there has been a 
discrepancy between theory and experiment, it has usually been 
possible to find reasons why the particular experiment failed and 
the general applicability of the laws of probability has remained 
unquestioned. We believe that these laws are no more to be proved 
or disproved experimentally than are (say) the terms in a Fourier 
analysis of a series. They are statements of mathematical relation- 
ships that are useful in analysing statistical sampling experience, and 
do in fact describe an important element in that experience. The 
justification for the laws is that they are useful in this way. This is 
the attitude in which the analyses of this section are regarded. 

In many collections of statistical data, individuals are of two kinds 


TABLE 2.3 


Niunber of Seeds 


Frequency of Rows 

Germinated per Row 

Actual 

Expected 

0 

6 

6*9 

I 

20 

19*1 

2 

28 

24 '0 

3 

12 

17-7 

4 

8 

8-6 

5 

6 

2*9 

6 

— 

0*7 

7 

■ ■■ 

0*1 

Total . . 

80 

80 -o 


and divided into sets ; the binomial distribution may be used to see 
if, for any particular collection, the individuals are independent and 
the two Hnds occur at random in the sets. A special experiment to 
illustrate this was conducted by incubating 800 cabbage seeds on 
filter paper in rows of ten, and after eight days the number of 
germinated seeds in each row was counted. These data were formed 
into a frequency distribution in which the variate was the number 
of germinated seeds per row and the individuals were rows ; this is 
given in the second column of Table 2.3. If the germinated seeds 
were distributed at random among the rows, this distribution would 
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be expected to be a binomial with n == 10. We do not know the 
probability ^ of a single seed germinating, but may estimate it 
from the relationship between the mean Z, n and p given in 
equations (2.2). The mean number of germinated seeds per row 
is (0 X 6 + I X 20 + * . . 5 X 6) ~ 80 = 2-175, and the estimate 
of ^ is therefore 2*175-7-10 = 0*2175. The proportionate fre- 
quencies of the expected binomial distribution are given by the 
expansion of (0*2175 + 0*7825)^0, and these multiplied by 80 are 
the expected frequencies of Table 2.3. The agreement between the 
actual and expected frequencies is quite good,* and as far as may 
be judged from this limited experience, the variations in geiminated 
seeds from row to row were random; the seeds were well mixed 
and independent and conditions were uniform, so that there was a 
constant probability of any one germinating. 

It may be objected that since the expected distribution was 
derived from the actual by choosing p to make the two means equal, 
the closeness of agreement signifies nothing. This objection is not 
valid, however, since the two distributions have only been made to 
agree in two respects, mean and total frequency, and they have not 
been made to agree in form; that agreement is a consequence of 
randomness. 

Another test of randomness is that the various moments should 
bear the same relations to each other as those of the theoretical 
binomial distribution given in expression (2.2). The most important 
relationship is that between the second moment and mean. The 
second moment or variance of the actual distribution in Table 2-3, 
calculated by the technique of paragraphs 1.31 and 1.32, is 1*744, 

■ and Z(i - Ijn), which may be termed the expected variance, is 
1-702; again the agreement is good. 

The almost classical example of a Poisson distribution is that given 
by coimts of yeast cells in the squares of a haemacytometer. The 
liquid in which the yeast cells are suspended can be regarded as 
consisting of aggregates of molecules of the liquid about equal in 
size to the yeast cells which are sparsely distributed among them. 
Then the probability that any aggregate taken at random is a yeast 
cell (the aggregate being a trial and the yeast cell a failure in 
the language of the theory) is extremely small; but there are very 
many such aggregates in the liquid under one square in the haema- 

* A criterion for judging the closeness of agreement between two 
distributions will be given in Chapter IV. 
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cjrtometer (i.e, in the 5ei),‘SO that the mean number of yeast cells 
per square is finite. Consequently, if the cells are distributed inde- 
pendently and at random through the suspending liquid, the fre- 
quency distribution of number per square should be the Poisson 
Series. Table 3.4 gives the distribution of the counts in 400 cells 
found by ‘"Student” (1907). The mean number of cells per square 
is 4*68, and from Soper’s tables the Poisson Series having a mean 


TABLE 2.4 


Number of Cells 

Frequency of Squares 

per Square 

Observed Expected 


0 

"" - 

3 '71 

I 

20 

17-37 

2 

43 

, 40-65 

3 

53 

63-41 

4 

86 

74-19 

S 

70 

69-44 

6 

54 

54-16 

7 

37 

36-21 

8 

18 

21 • 18 

9 

10 

11-02 

10 

5 

S’i6 

II 

2 

2-19 

12 

2 

0-86 

13 

— 

0*31 

14 

— 

0 

• 

0 

15 

— 

0-03 

16 

— 

O-OI 

— 

400 

400 * 00 


(m) of the same value is constructed, the separate terms being multi- 
plied by 400 to give the expected frequencies of the above table. 
Thus, for ?« = 4*6, 0*010 052 of the squares should have zero cells 
and for ^ = 4*7 this frequency is 0*009095; hence by linear 
interpolation, for /« = 4*68 it is o*oio 052 — o*8 X o*ooo 957^ 
= o • 009 286, and this multiplied by 400 gives the expected frequency 
of 3*71. The agreement between the two distributions is quite 
good. The second moment of the observed distribution is 4*46, 
and is nearly equal to the mean. 
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In general, the Poisson distribution would be expected to apply 
where events occur at random and under constant conditions in a 
medixim (usually space or time) that may be divided into a number 
of equal zones, provided that relative to the size of the zone the 
medium is continuous (i.e. there is a very large number of elemental 
units of medium per zone) and the events are rare ‘'points,^’ i.e. 
there is room in each zone for an exceedingly large number of events 
although the actual number in any zone is moderate. Examples are 
the distribution of microscopic and ultra-microscopic particles and 
bacteria in liquids, the numbers of a-particles emitted from radio- 
active substances in intervals of time, and counts of weeds or pests 
in given areas of land in agricultural field trials. 

When the binomial and Poisson distributions fit any given experi- 
mental data, it is inferred that the variations are due to chance and 
cannot be reduced or controlled unless the whole character of the 
data can be altered by introducing some method of selecting indi- 
viduals of a given kind. It does not always happen, however, that 
there is good agreement between theory and experiment. Where 
there are discrepancies, the plain fact is that the incidence of the 
event is not random, but the matter is not often allowed to rest 
there. The investigator usually tries to find the cause of the lack 
of randomness, and this extends beyond the realm of statistics. 
Sometimes the cause may be some fault in sampling technique that 
may be corrected ; sometimes the search for the cause may lead to 
a discovery of scientific value. A frequent kind of discrepancy is that 
in which the actual variance is greater than the expected, and this 
is usually interpreted by assuming that conditions are not uniform, 
regarding the probability of the event as varying from one set of 
trials or zone to another. Then the actual distribution is composed 
of several binomial or Poisson distributions superimposed, and the 
actual variance is equal to the expected variance plus that due to 
the fluctuations in the probabilities. In such instances, the experience 
is not merely dismissed as being non-random, but the variation is 
analysed into two parts, systematic and random. The principles of 
such analyses will be dealt with in Chapter VI. 

There are kinds of discrepancy other than that mentioned, 
and these lead to other analyses, all of which include a random 
element. 

Readers who are interested in further examples of the uses of the 
binomial and Poisson distributions are referred to the following 
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references: Cochran (1936), Fisher (1936^), Przyborowski and 
Wilenski (1935), and Tippett (1934). 


Normal Distribution . 

2 . 5 . The Normal distribution may be derived mathematically as 
another limiting form of the binomial that is approached as n 
becomes very large, both p and q remaining finite. A binomial dis- 
tribution may be represented by a histogram with each group centred 
over the value of the variate corresponding to the number of occur- 
rences per set; there are (tz + i) groups and the outline of the 
diagram is of the characteristic stepped form. As n, and hence the 
number of groups, increases, it becomes necessary to reduce the 
scale of the variate to keep the diagram within reasonable dimen- 
sions and so the steps in the outline become smaller. If this process 
continues indefinitely, it may easily be imagined that in the limit 
the steps in the outline coalesce to form a smooth curve ; this is the 
frequency curve of the Normal or Gaussian distribution. Since it 
is a continuous curve, it necessarily has a continuous variate. Its 
equation may be written 

N 

y == 

'V 2,7r a 

where e is the exponential base = a-yiS 28 . . . , ^ is the variate, 
and Ny m and a are constants. It will be seen later why the equation 

is written in this particular form and the and — \ are not 

incorporated in the constants. The derivation of this equation from 
the binomial is purely an algebraic process. 

This curve is that given in Fig. 2 for = o, jSg = 3 and in 
Fig. 3. It extends between x = + 00 and = — 00, since only at 
those extremes does y == o, and it is symmetrical about an ordinate 
at ^ i.e. about the ordinate at O in Fig. 3. 

This frequency curve is deduced from a histogram and conse- 
quently areas under the curve and not heights of ordinates represent 
frequencies. It is therefore appropriate to use the notation and ideas 
of the integral calculus, to imagine about any given value of x an 
elemental sub-range dXy and to regard the area under the curve 
between ordinates drawn at the limits of the sub -range as an element 
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of frequency, df (say). Then, this elemental strip may be regarded 
as a rectangle of height y and 

df = ydx~—=:-e dx • . . . (2.4) 

\ Ztt o 

It is now possible to find the various moments of (2.4). The 
total area under the curve is the total frequency, and integrating 
(2.4) between the limits of 00 this is found to be N. Hence 

N in (2.4) is the total frequency. The other moments may be found 



Fig. 3. 


from equation (1.3), p. 38, substituting df for 72^, the frequency in 
the sub-group, and integrating instead of summing. Thus the mean is 



^+00 

xdf = m 


V — OQ 



so that the constant m is the mean of the distribution. Similarly 
the other moments may be found, and the first four are : 


mean — m 
1^2 ^ 

M'4=3/^2^ ^2==3- 

These expressions for the mean and variance cannot be deduced 
from those given in (2.2) for the binomial distribution by writing 
71 = 00, for they would both become infinite; this is balanced in 
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the derivation of the curve by the ultimate reduction in the scale 
of the variate, referred to above. The ratios and jSg, however, are 
pure numbers, and by putting n = 00 in (z.z) they reduce to the 
values given above for the normal curve. 


Determination of Normal Frequencies 

2 . 51 . Apart from the total frequency, iV, any given normal distri- 
bution is completely characterised by the two constants m and o. 
We are interested in calculating proportionate frequencies between 
various limits of x. This could be done by integrating (2.4) between 
the limits if it were possible to evaluate the integral, but as it is 
not, ordinates must be calculated, and the integration performed by 
quadrature or by some graphical means. 

For example, in Fig. 3 the proportionate frequency having values 
less than a value denoted by G on the variate scale is the area under 
the curve to the left of GiJ, expressed as a fraction of the total area 
under the curve ; the proportion between G and E is the area between 
GH and EF ; the proportion greater than E is the area to the right 
of EF. This process of integration by quadrature is possible, but 
laborious. However, tables have been calculated, and these reduce 
the labour considerably. 

A complete set of tables for a range of values of m and a would 
be enormous, but by performing a mathematical transformation all 
normal curves reduce to a standard form. The transformation con- 
sists in measuring the variate as a deviation from the mean, divided 
by the standard deviation. If we write 


>1. 



(2.5) 


then for a sample of i individual (2.4) becomes 



V 


-.e = zdw 


Ztt 



This may be termed the standardised form of the normal curve, 
and has a mean of zero and a standard deviation of unity. The 
probability integral at w is the area under the curve to the left of 
an ordinate drawn at w and may be written 





V 


— 00 


277 


^^dw. 
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Very complete tables of and z for positive equally spaced values 
of w have been calculated by Sheppard and are included in Tables 
for Statisticians and Biometricians, VoL I (Pearson, 1931), where 
our w is called x and our is called |(i + a).* To find the 
probability integral for a negative value of w, use is made of the 
fact that the distribution (2.6) is symmetrical about an ordinate at 
zc; = o. If is the integral zt — w and is the integral at 
it is easy to see that 

A . — I A.^ •■•••• (2.7) 

There are also, tables that give values of zo for equally spaced values 
of while others give zv for equally spaced values of 2(1 — A^)- 
If OE in Fig. 3 represents a deviation + w and OG a deviation — zv 
{OE = OG and since O is at the centre of symmetry it represents 
w = 0) the area to the left of EF is A^ and the “tail'’ to the right 
is (i — A^. Similarly, by symmetry the area of the “tail” to the 
left of HG is (i A^ and so 2(1 — A^ is the sum of the two 
“tails” beyond the deviations + ^ and — w. The probability 
integral is sometimes tabulated in this form because it is of interest 
in sampling theory; this is the method of presentation used by 
Fisher (1936a). 

To find the probability integral for any value of an actual variate 
{x in our notation) it is transformed to zo by equation (2.5) and the 
corresponding integral is that required. Thus, if = 67*6976 
inches and 0 = 2*7048 inches and it is required to find the 
probability integral zt x= 60*5 inches say, then 


6 o -5- 67-6976 
2-7048 


— 2-66i o 


From Sheppard’s tables the value of the integral at a deviation of 
2-66 is 0*996 09 and at 2*67 it is 0*996 21 ; so the first difference is 
o • 000 12, and by linear interpolation the integral for a? = + 2 • 661 o is 


A^ = 0*996 09 + 0*000 12 X 0*10 = 0*996 10. 

Hence, from equation (2.7), the integral for — 2*6610 is 
0*003 9^* 

The proportionate frequency between any two values of the 
variate may be foimd by taking the difference of the two corre- 

* In this book we shall continue to use our own notation, even when 
referring to these tables. 



METHODS OF STATISTICS 



spending integrals. Thus, in the above example the integral 
corresponding to ^ 59-5 is found to be o-ooi 22 and the pro- 
portionate frequency between 59*5 and 60*5 inches is 0*00390 
— 0 * 001 22 = 0 • 002 68. 

A normal frequency distribution may be “fitted’’ to an actual 
distribution by putting m and a equal to the computed mean and 
standard deviation respectively and then finding the frequencies in 
the sub-groups from the normal probability integrals. The propor- 
tionate frequency calculated in the above example is for the second 
group of the distribution of heights of fathers in Table 1.5 and 
has been calculated in this way; the process is completed in the 
later columns of that table. Column (8) gives values of w corre- 
sponding to the limits of the sub-ranges, column (9) gives proba- 
bility integrals, in column (10) these are converted to frequencies 
by multiplying by the total N — \ 078, and the normal or “expected” 
frequencies in column (ii) are the differences of the values in the 
previous column. These may be compared with the actual fre- 
quencies, Tjj, in column (3). In order to plot the curve we must 
find the ordinates. Sheppard’s tables give values of the ordinates 
of the standardised curve corresponding to the deviations w (see 
equation 2.6), and these are given in column (12) of Table 1.5. The 
ordinates of the actual curve are obtained by the transformation 


1 



398-55^. 


and these are in column (13). In Fig. 4 the curve drawn from these 
ordinates is superimposed on the histogram. 


Sometimes a frequency or proportionate frequency is given, and it 
is desired to find the corresponding value of the variate. The value 
of the standardised variate w may be found from Sheppard’s tables, 
and hence, knowing m and cr, the actual variate be calculated from 
equation (2.5). 

For example, suppose it is required to know for the data of 
Table 1.5 the limit of height such that 20 per cent of the fathers 
are shorter than the limit and 80 per cent are higher, assuming the 
distribution to be normal with mean and standard deviation equal 
to the values already computed. Then A = o-z, and since this is 
less than 0-5 it corresponds to a negative value of z:?; we must 
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therefore first fold w corresponding to ^ = i — o*3 = o*8. From 
Sheppard’s tables, the value of the variate at which A == 0-799 546 
is 0*84 and the value at which A = 0-802 338 is 0-85. Hence, by 
linear interpolation, the value at which A = o- Sis 


w 


0 • 8 — o • 799 546 
0-802 338 — 0*799 54 ^ 


X o-oi + 0-84 = 0-841 63. 


Hence the value of w at which ^ = 0-2 is — 0-841 63 and if this 
is substituted in (2.5) the limit of height is found to be 65 • 421 inches. 



Relation between Normal Frequencies and Standard Deviation 

3 . 52 . We cannot here give a full table of the probability integral 
of the normal distribution, but a few important values are given in 
Table 2.5. The first entry in this table is obvious; the ordinate at 
= 0 is the axis of symmetry of the curve, and the area up to this 
ordinate is half the total area. 

The second entry of the variate is the quartile deviation; the 
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proportionate frequency between positive and negative values of 
this deviation is half the total. The ordinates GH and EF drawn in 
Fig. 3 are at values of the variate corresponding to = + 2*0 and 
— 2*0, and the proportionate area of the two ‘‘tails” beyond these 
limits is 0*045 50; i.e. about 5 per cent of the individuals in a 
normal population deviate from the mean by twice the standard 
4 eviation or more. Similarly, from the last entry in Table 2.5 we 
see that 99*73 per cent of the individuals are contained within a 
total range of six times the standard deviation. These data will 
assist readers in appreciating the significance of the standard 
deviation as a measure of variability when applied to a distribution 
that is approximately normal. 

The proportional relations between the standard deviation and 


TABLE 2.S 
Normal Distribution 


Variate 

V) 

' ProbaUlity Integral 

A 

w 

Sum of Two “Tails” 

0 

0 *500 00 

I ■ 000 00 

0 * 674 49 

0*750 00 

0 ■ $00 00 

10 

0*841 34 

0-317 31 

2*0 

0-977 25 

0*045 50 

2*6 

0*995 34 

0 • 009 32 

3*0 

0998 65 

0 * 002 70 


other measures of dispersion given above in section 1.23 are true 
only for normal distributions, although they are roughly applicable 
also to many distributions that are approximately normal, 

- * 

Practical Applicability of the Normal Distribution 

2 . 53 . The Normal distribution is a continuous curve, and first we 
must discuss the applicability of frequency curves in general to 
practical data. 

It is assumed that for most infinite populations with a continuous 
variate, the frequency distribution may be represented by a con- 
tinuous frequency curve. This is an extrapolation of the practical 
experience that as the size of the sample is increased, the sub-ranges 
of the distribution may be reduced and the outline of the histogram 
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usually becomes more regular. Where a form of curve can be assumed, 
the parameters may be estimated from a large sample, and a curve 
be fitted to the actual distribution, as has been done for Table 1.5. 

The normal curve has also been deduced mathematically as the 
distribution that results from the combination of an infinite number 
of very small random errors, and this fact together with the fact 
that the distribution is a special case of the binomial gives it a 
fundamental status. It is believed by some to represent the devia- 
tions due to experimental errors in measurements of a physical 
constant, and a quantity that is distributed according to this law is 
sometimes held to be subject only to chance causes that cannot be 
controlled. Conversely, where a distribution is skew, it is often 
inferred that superimposed on the chance variations are some larger 
ones due to a few important causes that may be controlled. We are 
dubious of this use of the normal distribution. Doubtless normality 
may be made a definition, and hence a test, of “randomness” just 
in the way that conformity to the binomial and Poisson distributions 
has been interpreted, but it is doubtful if for the normal curve such 
a course has any practical significance. Frequently, when man has 
done all he can to control the variation in a character and the 
remaining variations are practically random, the resulting distribu- 
tion is far from normal. Experimental errors in physical measure- 
ments are by no means always normally distributed, and some 
quantities like the strength of elements of various natural and 
manufactured materials have essentially skew frequency distribu- 
tions. It may also happen that the distribution of a quantity is to 
all intents and purposes normal, and yet a further degree of control 
may be possible. On the whole, we doubt if the normal distribution 
is of more than empirical value ; it is a convenient means of describing 
any data it happens to fit. 

In presenting this view of the place occupied by the normal curve 
in the statistical scheme, it would be a mistake to underestimate its 
importance. The curve does in fact represent closely a very large 
number of experimental distributions and approximately represents 
many more. It is important, too, because it is the distribution on 
which most of the statistical theory of errors is based. 

Non-normal Curves 

* 

2.6. There are several systems of smooth curves for describing 
data which do not follow the normal law, of which Pearson’s is 
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probably the most useful; all the curves in Fig. 2 are of this system. 
The type is decided upon from the values of the two j8 ratios, and 
then these, together with the standard deviation and mean, are used 
to find the unknown parameters in the equation. It is not often 
in practical work, however, that any useful purpose is served by 
fitting such curves (their chief use is in sampling theory and in 
actuarial work), so we will merely refer the interested reader to some 
such book as Elderton’s Frequency Curves and Correlation, 

Sampling Distributions 

2 . 7 . A sampling distribution results if a large number of finite 
random samples are taken from an infinite population, some single 


TABLE 2.6 

Frequency Distributions 



27- 

30- 

33- 

36- 

39- 

42- 

45 " 

48- 51- 

Individual observations . . 

4 

2 

I 

3 

7 

7 

10 

17 13 

Means of 5 






3 

4 

5 a 



54 ~ 

57- 

60- 

63- 

66 - 

69- 

72- 

Total 


13 

8 

7 

3 

I 

3 

I 

100 


5 

I 






20 


characteristic of each sample is computed and the computed values 
are formed into a frequency distribution. For example, the observa- 
tions of Table 1.3 may be regarded as forming twenty samples of 
five observations from one population. The means of these samples 
are given in Table 1.3, and are formed into a frequency distribution 
in Table 2.6. This table also gives the distribution of the individual 
observations. 

The distribution of means is called sampling distribution of the 
mean. Like any other frequency distribution it has a mean, and a 
standard deviation. The standard deviation of such a distribution 
is called the standard error of the mean. 

Other constants of the samples of five could have been calculated, 
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for example the standard deviations or the medians. Each of these 
would have had its own sampling distribution and standard error. 

Most sampling distributions in common use may be deduced 
mathematically by applying the laws of probability, but given 
sufficient time and energy one could determine them experimentally 
by making up an artificial population (say) by writing numbers on 
cards, shaking up in a bag, drawing samples and calculating the 
means or other statistical constants much in the way in which 
Table 2.6 was constructed from Table 1.3, except that the scale of 
the experiment would need to be much larger. This kind of tech~ 
nique is actually used when the distribution cannot be deduced 
mathematically. 

Sampling Distribution of Mean 

8 , 71 . We are particularly concerned here with the sampling distri- 
bution of the mean in samples of size N drawn from an infinite 
population in which the individuals are normally distributed. This 
is itself a normal distribution with a mean equal to the mean of the 
individuals in the population and a standard error of 

or 

where a is the standard deviation of the individuals. 

It will be noticed that as N increases, the standard error of the 
mean decreases. This is shown in Fig. 5, where there are given the 
sampling distributions of the means of samples of i, 4, 16, 25 
and 100 from a population in which the individuals are normally 
distributed with an unspecified standard deviation. This population 
distribution is that in Fig. 5 for iV = i. For the larger samples, 
there is less dispersion about the population mean. This is the 
statistical demonstration of the common experience that a large 
sample gives a more accurate representation of the population than 
a small one; the increase in precision is measured by a reduction 
in standard error. 

The normal probability integral may be used to calculate propor- 
tionate frequencies of samples having means within given limits, 
and Table 2.5 may be used by reading ''deviation of sample mean 
from population mean, divided by standard error” for variate •id. 
Thus only 0 • 045 50 or about i in 20 of the possible samples have 
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means that differ from the population value by more than twice the 
standard error. 

Deduction of Sampling Distributions of Mean and Standard Deviation 

2 . 72 . The sampling distributions of both the mean and standard 
deviation may be deduced together. The proof is taken from a 
paper by Irwin (1931). 

__ _ 


% 


N‘25 



Normal Sampung Distributions 


Fig. 5. 

Let the population mean be ^ and the other particulars as stated 
above. Also let the variate be x and the values in any one sample 
of AT be ;x;2 • • • ^5 • • • Since the population distribution is 
continuous, we cannot state what is the probability of a value a:,, 
for an ordinate of the frequency curve drawn at has no area, but 
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the probability of a Value lying within an elemental range dx^ about 
a value is 

— dx,, 

Vzttct 

Applying the second rule of probability, the probability of a sample 
having values lying within ranges dx^ dx^ . . . of . . . which 
we may shortly describe as “the probability of the sample,” is 

" ^ 6 ^ cr^ dX-ydx^ • • • or 

{ZTTyia^ 

I . • - ■*^"“^^(^1+^2+ • • • 

dx^dx^ - . - dx^. 

(277)1 

Now if X and s are the mean and standard deviation in the sample,* 
according to sections 1.23 and 1.3 1, 

+ ^2 + • ‘ • ^ x\ + x\ + \ . , x^ — Nx^ == Ns'^ 

Substituting these in the exponent of the above expression, the 
probability of the sample is 

— e ^ dx-jdx^ . . . dx^, 

(Z7r)'2C7^ 

Many of the possible samples from this population will have a 
mean value x and a standard deviation S', and to find the probability 
of a sample with these two constants lying within ranges dx and ds 
of X and 5 we must apply the first probability rule and integrate 
the above expression for all samples having these two constants. 
This is a mathematical step involving a transformation to polar 
co-ordinates in JV-dimensional space and a subsequent integration, 
and leads to the following expression for the probability of a mean 
of X and a standard deviation of ^ : 

df = ^ e ^dxds 

* For the first time it is necessary to distinguish clearly between the 
population value of a statistical constant and the value estimated from a 
sample. Usually we shall denote population values by Greek letters and the 

sample values by corresponding italic letters. This is why we have used ^ 
instead of wi in the equation for the normal distribution. 
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where K is z constant. This is the probability or proportionate 
frequency distribution of the two statistical constants. Since the 
term containing x does not contain s, and vice-versa^ the converse 
of the second rule of probability implies that x and ^ have inde- 
pendent probabilities, and the two distributions may be written 
separately : 


and 


_N {x-7)^ 

df = K-^e ^ dx 



Ns^ 

df = 


. ( 2 . 8 ) 


By integrating the first expression over the whole range and equating 
the result to unity, the constant is found to be 



The distribution of the mean is a normal one with mean equal 
to I and standard deviation equal to cr/V N. That of the standard 
deviation is more complicated. 



CHAPTER III 


ERRORS OF RANDOM SAMPLING AND STATISTICAL 

INFERENCE 

Much statistical work is concerned with attempts to learn some- 
thing of the characteristics of populations from measurements made 
on fimte representative samples. It is a fact of experience that suc- 
cessive samples from one population usually differ amongst them- 
selves, and therefore in general they differ from the population. 
They are subject to the so-called errors of sampling that bring 
uncertainty into any inferences that may be made regarding the 
population. This chapter is concerned with the theory of statistical 
inference, taking account of these errors; the applications of the 
theory are described for large samples only. 

Random and Representative Samples 

3 . 1 . We shall deal only with simple random samples, in which, as 
stated in section 2.13, every individual is independent of every 
other. A sample of i 000 men consisting of 500 pairs of brothers is 
not random, for there is a tendency for brothers to be alike and 
there are only 500 independent individuals; the others are related, 
statistically as well as by blood, to the first 500. Special precautions 
are necessary in practice to obtain satisfactory samples; bales of 
cotton, sacks of corn and the like are seldom uniform, and a small 
quantity taken from one place is not representative of ^ the whole. 
In such circumstances, it is necessary to select the individuals one 
at a time, but where the bulk is well mixed and statistically homo- 
geneous, a group of individuals taken from any part is effectively a 
random sample. 

The theory of sampling describes the relations between samples 
and the infinite population, and this last will differ from the actual 
population or bulk if the sample is biased. In practice, it is the 
- bulk that is of interest, and it is desirable to make the sample a 
representative one by seeing that every individual in the bulk has 
an equal chance of being included. This is not always easy. For 
example, when taking cotton hairs singly from a bale, there is an un- 
avoidable tendency to take too many of the longer hairs. The method 
of drawing individuals must be independent of their character. 
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The degree of elaboration necessary in a sampling technique to 
ensure randomness and representativeness depends on the nature 
of the material being sampled, and the arrangement of a technique 
is more a matter for the experimentalist than for the statistician. 
Readers are advised, however, that investigation in particular fields 
has often shown that much more care in sampling is required than 
appeared to be necessary at first sight. Although it is impossible to 
give general rules, the following description of two particular 
sampling methods may be useful as illustrations. 

The first method is that applied to sampling cotton hairs. Hairs 
are taken from different parts of the bale, not singly but in tufts 
sufficiently large to avoid the bias towards length. Each tuft is then 
divided roughly into two halves and one half is discarded. The 
other is halved again, one half is discarded and the other is again 
divided. This is repeated until the final reduced tuft contains only 
a few hairs. A number of such reduced tufts is combined to form 
the final sample, every hair of which is measured. If at each division 
the half to be discarded is decided by lot, say by the toss of a coin, 
the final sample will be truly representative. But it will not be a 
simple random sample unless the final reduced tufts contain only 
one hair each. However, we shall show how to treat such complex 
samples in section lo.ii. 

The second method may often be applied when the population 
to be sampled is large but finite. Suppose we wish to inspect a 
random sample of houses in a large town. The town may be divided 
into wards, and the wards into streets. One ward may be chosen at 
random, a street may be chosen at random from those in the ward, 
and a house from those in a street; this house is a randomly chosen 
individual, and if the whole process is repeated several times, the 
resulting houses are a random sample of the town, provided the 
wards and streets are approximately equal in size. If the wards, the 
streets within each ward, and the houses in each street sgre all 
systematically numbered, the problem of random selection resolves 
itself into one of choosing a number at random. This process is 
facilitated by Random Sampling Numbers (Tippett, 1927), which 
is a collection of 40,000 numbers from o to 9, arranged in random 
order. By combining an appropriate number of digits taken any- 
where from this collection, random numbers of almost any magnitude 
may readily be obtained. 

It should be noted that if arrangements are made to represent 
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each ward equally in the sample or to see that no two houses are 
in the same street, the sample is representative but not random. 
Such may be called a stratified sample, and its random errors are 
dealt with in section 10.13. 

The theory of random sampling is sometimes applied to experi- 
mental errors in physical determinations. It is not unreasonable to 
regard the limited number of measurements that may actually be 
made of a physical quantity as a random sample of an infinite 
population of measurements that could conceivably be made under 
the same conditions. The average of this population, however, is 
not necessarily the true value of the quantity; it is affected by errors 
of bias due to the particular conditions of the experiment, to such 
factors as the personal error of the observer and idiosyncrasies of 
the apparatus. Since in most physical experiments errors of bias 
are liable to be of the same order of magnitude as random errors, 
the theory of random sampling is of very limited utility in this field. 

The qualities of randomness and representativeness do not depend 
on the size of the sample; a sample of five, if properly chosen, is 
as random and representative as one of five thousand. In this chapter 
we are limiting, not the general theory, but the application of the 
theory to large samples, i.e. samples of at least a few hundreds, and 
this is done to simplify matters. The more exact theory for small 
samples will be given in Chapter V. 

We shall again make the further assumption that the population 
sampled is infinite in the sense that its composition is not appre- 
ciably altered by the abstraction oi the sample. This is satisfied, 
practically, if the individuals are drawn one at a time from even a 
very small bulk provided they are returned to the bulk and well 
mixed with the other individuals between each draw. 

Tests of Significance 

3 . 2 . Many scientific investigations involve the employment of the 
method of framing working h3q)otheses and testing them experi- 
mentally. As long as the experiments fail to disprove them, so long 
are the hypotheses accepted. This is the general method by which 
many statistical inferences are made. A hypothetical population of 
certain characteristics is postulated, and if the sample is such that 
it could reasonably have come from that population, the hypothesis 
is accepted. Owing to sampling errors, however, there is no sharp 
dividing line between samples that could have come from the 
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hypothetical population and those that could not. It is only possible 
to give a probability that a sample like the one observed could have 
come from the population. If the probability is low, the hypothesis 
is rejected; if it is high, the hypothesis is accepted and the deviation 
between the sample and postulated population is attributed to 
errors of sampling. The sample may be characterised by one or more 
statistical constants, and in the next paragraph the process will be 
illustrated for the mean of samples. 

If there is some ground for assuming* the population to be normal 
with a given mean and st^dard deviation, the distribution of the 
means of samples of any given size is the sampling distribution 
described in section 2.71, and from the probability integral, the 
probability of a sample mean deviating from the population mean 
by more than any value may be deduced. For example, we see from 
Table 2.5 that the probability of a sample mean differing from the 
population mean by plus or minus three or more times the standard 
error is 0*002 70. This is low, and if any actual sample does differ 
in mean from the population value by more than this amount, , 
we infer that the deviation may not reasonably be attributed to 
random errors and the hypothesis regarding the population has 
to be abandoned. 

The probability of a random deviation exceeding any given value 
is called the level of significance of the deviation, and is often 
expressed as a percentage. For example, a deviation of dz 3 times 
the standard error is on the 0*27 per cent, level of significance and 
one of I 'O times the standard error is on the 32 per cent, level. 

By way of example we will assume the lengths of 4 000 hairs of 
an Indian cotton given by Koshal and Turner (1930) to be an 
infinite population. f Their mean length is 2 • 33 cm. and the standard 
deviation is 0-480 6 cm. The first thousand hairs were selected by 
a different method from the rest and gave a mean of 2-54 cm. Is 
this deviation compatible with the hypothesis that the i 000 are a 
random sample from the 4 000 and that the difference in means is 
due to random errors, or is the difference large enough to indicate 

* There is no fundamental distinction between an hypothesis and the 
assumptions (e.g. normality). However, a good test is sensitive to falseness' 
in the hypothesis, which is what we want to test, and comparatively 
insensitive to errors in the assumptions, which are subsidiary. 

t This assumption is an approximation, for 4 000 is not infinitely large 
compared with i 000, the size of the sample we are testing. 
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that the change in technique has had an effect? The standard error ■ 
of the mean is 


0*480 6 
V I 000 


0-015 3, 


and the deviation of o*2i cm. is over 13 times the standard error. 
Sheppard’s normal probability tables show that this is far beyond 
the 0*000 001 level of significance, and the hypothesis is untenable. 

Any deviation that is large enough and is on a sufficiently low 
probability level to lead to a rejection of the hypothesis regarding 
the population is said to be statistically significant and the mean of 
the sample is said to be significantly different from that of the popu- 
lation. The question arises : at what probability level does a deviation 
become significant? There is no rational probability level at which 
possibility ceases and impossibility begins, but it is conventional 
to regard a probability of 0*05 as the critical level of significance. 
The considerations that govern the choice of this level will be dis- 
. cussed in section 3.3, and it is sufficient here to state that this 
convention has been found to give a satisfactory rule of action in 
most circumstances. It will be seen from Table 2.5 that a deviation 
of twice the standard error corresponds roughly to a probability 
level of o • 05, so we have the following working rule : 

A deviation of a sample mean from an assumed population 
value of twice the standard error lies on the 0*05 (or 5 per cent.) 
level of significance, and a deviation greater than this amount is 
statistically significant. 

As a measure of dispersion of the sampling distribution of sample 
means, the quartile deviation is sometimes used instead of the 
standard error, and is called the probable error. The probable error 
is o • 674 49 times the standard error, and three times the probable 
error is roughly equivalent to twice the standard error. The probable 
error has no particular advantages, and as it involves the trouble- 
some factor o ■ 674 49, it is rapidly going out of use. 


Significance of Difference between Two Sample Means 

3 . 21 , The most common situation is one in which the investigator 
has two samples, and wishes to know if their differences are real or 
may be attributed to errors of random sampling. Again we shall 
confine attention to the two means. The appropriate hypothesis is 
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that the two samples are from populations having the same mean, 
and the probability level of the observed difference is calculated 
accordingly. Again the hypothesis is accepted if the level is fairly 
high and there is no statistically significant difference between the 
means; if the level is low (say below 0*05), the hypothesis is rejected 
and the difference is real. To calculate these probabilities it is 
necessary to know the sampling distribution of the difference 
between two means. 

Let the total numbers of individuals in the two samples be 
and N2. Then it is possible to imagine a sampling experiment in 
which a very large number of pairs of samples are taken from the 
respective populations, the number of individuals in the first of each 
pair being Ni and the number in the second being Ng* For each 
sample the mean may be found, and hence for each pair, the 
difference between the two means. There will be as many differ- 
ences as there are pairs of samples, and these may be formed into 
a frequency distribution — the sampling distribution of the difference 
between two means. This distribution has been deduced mathe- 
matically and is normal with a mean value of zero, as may be 
expected, and a standard deviation (or standard error) larger than « 
the standard error of either sample mean taken separately. If SE^ 
is the standard error of the distribution of one series of means and 
SE2 is that of the other, the standard error of the distribution of 
differences is 

SE,_, = V{SE,f + . . . (3.1) 

provided the two samples are independent. Further, if cr^ and ag 
are the standard deviations of the individuals in the two popula- 
tions, the standard error of the difference between the two means is 



• ( 3 * 2 ) 


Usually, ai and erg do not differ appreciably, and it is reasonable 
to include in the hypothesis the postulate that the two populations 
are the same, so that = erg. 

In these circumstances, if the two single samples are of the same 
size they have the same standard error, and the “twice the standard 
error” criterion leads to the working rule that differences greater 

* This follows easily from the equations in section 3.55. 
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than three times the standard error of a single mean are significant ; 

for if SEi == SE2, ^ zSEj^ and 2\/ z is nearly equal to 3. 

In order to compute the standard error, it is necessary to know cr, 
the standard deviation of the individuals in the population. Fre- 
quently this is unknown, and as an approximation an estimate ^ 
obtained from the samples is used instead. This practice limits the 
application of the theory to large samples. Where there are two 
samples and a common standard deviation is assumed, some com- 
bined estimate of 5 should be used since the hypothesis is that the 
samples are from the same population. If and are the two 
sample estimates, a suitable combined estimate is 



+ N,sl 

N^ + N, 


• ( 3 - 3 ) 


and substituting this value of ^ for the values of a in equation (3.2) 
we have 



Frequently, however, the standard errors of separate means are 
calculated more or less as a routine and it is convenient to use 
these directly in equation (3.2) leading to the expression 



This latter course is not consistent with the hypothesis that the 
samples are from the same population, but when the samples are 
large the two courses usually lead to practically the same results.* 
The second is used in the following example. 

The British Association Report for 1883 gives on p. 256 distri- 
butions of the heights of men born in England and Scotland ; as an 
example we will test the significance of the difference between the 
two means. The necessary data are in Table 3.1. 

The difference in height is i -108 i inches, and its standard error 

IS 


0*032 38^ + o-o68 68^ = ± 0*075 9 > 


* The second course is consistent with the original hypothesis that the 
samples are from two populations with the same mean and different standard 
deviations. This is a legitimate assumption, but is not usually as acceptable 
on other grounds as the one given above. 
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I • io8 I is 14*6 times 0-075 9 > we thus conclude that Scotsmen 
are really taller than Englishmen. 

General Discussion 

3 . 3 . We shall first discuss the choice of the level of statistical 
significance. If we work consistently according to the rule that the 
hypothesis is acceptable provided the observed difference or devia- 
tion is below a given level and is rejected if the deviation is above, 
one of four possible situations may arise* : 

We may be in error because we : 

(1) Reject a true hypothesis, or 

(2) Accept a false one ; 


TABLE 3.1 


Country 

Number in 
Sample 

Mean 

Height 

(inches) 

Standard 

Deviation 

(inches) 

Standard Error of Mean 

England 

6 194 

67-4375 

2-548 

2*548 — V6 194 
= ± 0*032 38 

Scotland 

I 304 

68-545 6 

2*480 

2*480 -i- Vi 304 
= i 0*068 68 


or we may be correct because we : 

(3) Accept a true hypothesis, or 

(4) Reject a false one. 

In accepting a hypothesis according to the rule, we only do so 
tentatively, since no hypothesis is, or ever can be, finally proved. 

In the long run of statistical experience, the ratio of wrong 
inferences under (i) to total inferences under (i) and (3) when the 
hypothesis is a true one can be made as low as we please by making 
the level of significance high|; indeed, this ratio is the probability 
level, and the rule already proposed leads to a true hypothesis being 
rejected once in every twenty experiments for which a true hypo- 

* This analysis was given by Neyman and Pearson (1928), and some 
writers have accepted the classification to the extent of referring to 
situations (i) and (2) as “errors of the first and second kinds.” 

f A low probability corresponds to a high level of significance. 
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thesis is made. By adopting a level of o*ci, this kind of error is 
made only once in every hundred experiments. Unfortunately, by 
using a higher level of significance, we are more likely to accept 
the hypothesis and hence are more likely to be wrong under situa- 
tion (2). Thus the choice of the level for a criterion of acceptance or 
rejection is a compromise between the risks of the two types of error. 

Unfortunately, it is not possible to state the proportion of 
erroneous inferences under (2) to the total experiments in which a 
false hypothesis has been made. It will depend to a considerable 
extent on how near the hypothesis is to the truth as well as on the 
stringency of the test. 

The choice of the critical level of significance will depend to some 
extent on circumstances, but it is governed largely by the following 
consideration. We usually have a large body of knowledge and a 
scientific tradition to guide us, so that there is a fairly strong pre- 
disposition towards accepting any hypothesis we make. One of the 
strongest traditions favours simple hypotheses involving few con- 
stants in preference to complex ones involving many constants. For 
example, in investigating a possible difference in height between 
Englishmen and Scotsmen as in paragraph ^.21, it was in accordance 
with scientific practice to prefer the hypothesis of a single mean 
height to one involving two heights, i.e. to assume no difference 
until the contrary was proved. It is because of this, and because the 
effect of an error under (2) is less serious than one under (i) that 
the chosen criterion of significance usually has a probability level 
as low as 0-05, and in instances where there are exceptionally strong 
a priori reasons for preferring the hypothesis or an error under (i) 
may have important consequences, the appropriate probability level 
is O’ 01 or even lower. On the other hand, most hypotheses tend to 
be of a negative character in that they are in accordance with existing 
knowledge, and by choosing too low a probability corresponding to 
a leyel of significance that is too high, the proportion of mistaken 
inferences of the second kind may be too great, and advance of 
knowledge may be unjustifiably impeded. Too much scepticism may 
be obstructive. 

3 . 31 . It can be seen that one means of reducing the effect of 
erroneous inferences of the second kind without affecting those of 
the first is by reducing the standard error of the statistical constant 
under test. For a given level of significance, a smaller standard error 
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leads to a correspondingly smaller deviation lying just on the chosen 
level of significance, and so to a smaller deviation wrongfully dis- 
counted as not significant. The standard error may be reduced by 
increasing the size of the sample, and we shall see in section 3.7 
that from this point of view some statistical constants are preferable 
to others that measure equivalent properties of the frequency 
distribution* 


The statistical significance gives no information as to the magnitude 
or practical importance of any difference ; that can only be judged 
by one with technical knowledge of the subject to which these 
methods are applied. A very large sample may make very small and 
unimportant differences overwhelmingly significant, while if the 
sample is small, large and important differences may be obscured by 
random errors. The verdict “not significant,” therefore, is more like 
the “not proven” of Scots law than “not guilty.” If an approximate 
value of the standard deviations of the populations is available (say 
from preliminary samples), it is always possible to estimate how 
large samples are necessary to show up a given difference ; N should 
be amply big enough to make twice the standard error of the difference 
less than the specified one. 

If a sample mean is significantly different from a hypothetical 
population value or the difference between two sample means is 
significant, statistical theory can shed no light on the cause of the 
deviation. It may be either that the bulk sampled is really different 
from the hypothetical population or that the sample is not truly 
representative and random; either that there is a real difference 
between the two populations sampled or that there is some 
difference in sampling technique. 

It may be noted that there is nothing in these tests of significance 
that provides a criterion for choosing between hypotheses that may 
be compatible with the data. Such a choice must be made on non- 
statistical grounds. 

8 . 32 . So far we have taken as the 0*05 level of significance a posi- 
tive or negative deviation for which the two “tails” of the probability 
curve total 0-05, i.e. each “tail” separately is 0-025. This is because 
we usually have no a priori reason for deciding that the deviation 
or difference should have one sign or another. In the example of 
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section 3.2 treating of the lengths of cotton hairs, the mean of the 
sample of i 000 happened to be greater than the assumed popula- 
tion ; we should have tested the deviation in the same way had the 
mean been less. In the example of section 3.21 the Scotsmen were 
apparently taller, on the average, than the Englishmen; we should 
have tested the difference had the Scotsmen been shorter. In cir- 
cumstances like these, the probabilities must be added for both 
positive and negative deviations or differences, so as to make the 
calculation correspond to practice and to give correctly for the long- 
run of experience the ratio of mistaken to total inferences under 
situations (i) and (3) above. For differences, the argument may be 
put in another way which may be more helpful. Our hypothesis is 
that the true value is zero, and the question we ask is, “What is the 
probability that random sampling will give a difference greater than 
that observed?” Now we do not know the sign, and it is purely an 
accident whether in comparing two means, Xa and (say), we take 
— fj) or (xj — xj, so we may always make the difference positive 
(= -f d, say). Then in considering the sampling curve we must also 
do the same thing and only use the positive half of it, in which 
differences can only range from zero to plus infinity. Thus the 
probability that of all the positive differences that could be obtained 
from random sampling, one would be greater than d is the ratio 
of the area beyond the ordinate at + £? to the total area of the halved 
normal curve. In Fig. 3, OE is twice the standard error, and the 
area under the curve to the right of EF is 5 per cent, of the half of 
the curve to the right of the origin, O, or 2*5 per cent, of the full 
curve. Thus, there may be a distinction between the 5 per cent. 
value or point at which an ordinate cuts off a tail of 5 per cent, of 
the total area, and the 5 per cent, level of significance. 

Sometimes, it is reasonable to regard a deviation corresponding 
to a single “tail” of 0*05 as lying on the 0-05 level of significance. 
For example, a strong believer in Scottish superiority might argue 
that Scotsmen cannot possibly be shorter than Englishmen, that 
they must either be equally tall or taller. He would only subject to 
statistical test a difference that showed Scotsmen as taller, and any 
difference of the opposite sign he would dismiss without test as 
being due to random errors. It is consistent with such an attitude 
to regard a difference of i • 65 times the standard error, corresponding 
to a single “tail” of 0-05 as lying on the 5 per cent, level of signifi- 
cance. It is only legitimate to do this, however, if the grounds for 
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deciding the sign of the dilEFerence or deviation (if real) are a priori 
and independent of the result given by the sample. 

Groups of Samples 

3.33. The probabilities deduced in accordance with the foregoing 
theory are only for single pairs of samples taken at random, and when 
there are several samples, the problem of significance is more com- 
plicated. If we had a hundred differences whose true value was zero, 
we should expect four or five of them to be greater than twice the 
standard error, but if we applied the simple theory by “rule-of- 
thumb,’' we should erroneously report them as real. Similar con- 

TABLE 3.2 

Deviation 

Largest = of n Lying on 0-05 Level of Significance 

Standard Error 


n 

Deviation 

Standard Error 

I 

2*0 

2 

2*2 

4 

2-5 

6 

2*6 

10 

2*8 


siderations must be applied to groups of less than one hundred tests 
and we will work out the significance of the biggest of n deviations. 

Suppose there are n independent differences between pairs of 
means; e.g, the aim may be to see if several treatments have an 
effect on some quantity and one of each pair of samples may be an 
untreated control (a separate one for each pair) and the other be 
given one of the treatments. We wish to test if the biggest ratio 
difference! standard error of difference is significant. Let the proba- 
bility that errors of random sampling would give a single difference 
equal to or greater than d (say) be P (as found in the ordinary way 
from probability tables), and let be the probability that the 
biggest of n would equal or exceed d. Then (i — P„) is the proba- 
bility that n differences would be less than d^ and (i — P) that one 
random difference would be less ; from Rule II (section 2 . 1 1) we have 

(r - P„) = (I ^ P)-, 
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whence 

Table 3.2 has been worked out to show what value the largest of 
n deviations (in terms of the standard error) must reach to lie on 
the 0*05 level of significance for the normal distribution. 

Sometimes we want to know if the difference between the largest 
and smallest in a group of sample means is significant; such a 

TABLE 3,3 


Range 


Standard Error of 

^ on 0*05 

A Difference 

Significance 

AND 0*01 Levels of 

Number of Samples P = o - 05 

P = o-oi 

2 

2*0 

2-6 

4 

2*6 

3*1 

6 

2*9 

3*4 

10 

3*2 

3*6 

difference is a range. 

E. S. Pearson (1932) gives some values 


the range at various levels of significance and Table 3.3 has been 
calculated from them. When there are four samples, a difference of 
2*5 times is equivalent to one of twice the standard error for a pair. 

We shall illustrate this by the data of Table 3.4, which have been 
obtained from Table 10.6 showing the mean com yield per plot 
for a number of agricultural plots subjected to five treatments. 


TABLE 3.4 


Treatment 

Mean Yield per Plot, Grammes 

A 

295*2 

B 

297-5 

C 

276*3 

D 

272*2 

E 

271 *8 


I 

We are given that the standard error of any one mean is 8*712, 
so that the standard error of any random difference between two 
means is 12-32 grammes.* The largest difference is between treat- 

* The treatments are “dummies.” The standard error is deduced from 
Table 10.71, 
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ments B and E, and is 25*7 or 2* i times the standard error. For a 
single randomly chosen pair of means, this difference would be 
significant, but here it is the largest difference in a set of five means, 
and from Table 3.3 we deduce that it is well below the 5 per cent, 
level of significance. 

If it is desired to compare the means of several samples as a whole, 
there are other and more suitable methods that will be described 
in Chapter VI; but if the wish is to compare selected pairs, the 
above considerations show at least that more stringent tests of 
significance should be applied than when a single pair is taken at 
random. 

Applicability of Normal Sampling Theory 

3 . 4 , The correspondence of a deviation of twice the standard error 
to the 0*05 level of significance depends on the sampling distribu- 
tion of the mean being normal. Strictly, this only happens when 
the individuals in the sampled population are normal, but for large 
samples taken from most non-normal populations that are likely to 
be met with in practice, the distribution of the mean is nearly 
normal, and the normal sampling theory is sufficiently close an 
approximation for practical purposes. This frequently is a second 
reason for applying the methods of this chapter to large samples only. 

The population and a sample may be characterised by constants 
other than the mean, and in fact any of the constants mentioned 
in Chapter I may be used. These all have their sampling distribu- 
tions and, if they are known, deviations corresponding to various 
levels of significance may be calculated. For large samples, however, 
most of these distributions tend to be nearly normal, and as an 
approximation it is usual to apply the normal theory by calculating * 
the standard error of the constant, and regarding deviations of twice 
this standard error as lying on the 5 per cent, level of significance. 
In this chapter, further applications of sampling theory will usually 
follow this procedure, but it is desirable for the sake of subsequent 
applications to discuss the use of skew sampling distributions. This 
discussion follows in the next section. 

Tests of Significance Based on Skew Sampling Distributions 

v 

3 , 41 . For a symmetrical distribution we decided in section 3.32 
to regard the 2-5 per cent, points (on the positive and negative 
sides of the population value) as lying on the 5 per cent, level of 
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significance, and since the two deviations are equal, only one need 
be specified. When the distribution is not symmetrical, some modi- 
fication is obviously necessary, and there are three possibilities. 

(i) We may retain the equality of the positive and negative devia- 
tions, and regard as lying on the 5 per cent, level of significance a 
value such that the areas of the two tails beyond ordinates at these 
points together add up to 5 per cent, of the total area. In Fig. 6 
(which is not drawn to any scale, and is only diagrammatic), O is 
the population value, and OA = OA' is the significant deviation, 


M 



the areas beyond the ordinates at A and A' being 5 per cent, of the 
total. 

Such a criterion is to be condenmed, for it gives positive and 
negative deviations equal importance. 

(2) Extending the argument already applied to differences in 
section 3.32, we may use separate tests for positive and negative 
deviations from the population value. If the deviation under test is 
positive, we use only that part of the sampling distribution on the 
positive side of the ordinate drawn at the population value, and regard 
as lying on the 5 per cent, level the deviation at which an ordinate 
cuts off a tail of 5 per cent, of the positive part of the curve. Similarly, 
if the deviation is negative we use only the negative part of the 
sampling distribution. In Fig. 6, OB and OB' are the two deviations, 
and the area to the left of the ordinate at is o • 05 of that to the 
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left of OM, while the area to the right of the ordinate at B is 0*05 
of that to the right of OM, Because OM does not divide the total 
area into equal parts, the two tails are not equal and therefore are 
not 0*025 of the total area, 

(3) We may regard as lying on the 5 per cent, level of significance 
the deviations beyond which the two tails are each equal to 2-5 per 
cent, of the total area under the curve. In Fig. 6, C and C' are about 
on these points. The shape of any frequency curve may be altered 
by plotting the abscissae in terms of some function of ^ (e.g. putting 
x' == log x)j and it is theoretically possible to choose the function 
so that the curve reduces to the normal form. Such a procedure 
alters the relative positions of the mean and mode, but not of devia- 
tions defined by the ratios in which they divide up the area. Con- 
sequently 2-5 per cent, values of the transformed variate still 
remain 2-5 per cent, values of the untransformed variate x in the 
skew distribution. If the normal curve is regarded as the fundamental 
curve of errors, the deviations lying on the 5 per cent, level of 
significance according to this criterion would satisfy the normal 
criterion if the curve Were transformed. 

At the moment, both the second and third uses of skew curves 
for testing the significance of differences appear to be consistent 
extensions of the criteria developed for symmetrical curves, and it 
is difficult to decide between them. Fortunately, the difference in 
results obtained by the two methods is of no practical importance, 
and the last one will probably be most convenient in practice. 

Sampling Errors of Various Constants 
Means of Binomial and Poisson Disirihutions 

3 . 51 , When developing the binomial distribution in , Chapter II 
we regarded the sets of trials as individuals and the number of 
occurrences per set as the variate. If, however, we regard the trials 
as individuals for which the variate can only take one of two charac- 
teristics, viz, success or failure, the set of n trials becomes a sample 
of n and the binomial distribution becomes a sampling distribution 
of the number of sucesses per sample. From this point of view, the 
small n of the binomial is equivalent to the large N of this chapter. 
As stated in section 2.5, when the number in the sample is large, 
the binomial distribution approaches the normal and the normal 
sampling theory may be applied. 


i 
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For the binomial the standard error of the niimber of successes 
(/ = np) is the standard deviation given in equation (2.2), p. 48, 
and is 

on dividing thus by n we obtain, 


Standard Error of ^ = i 



n 


Darbishire (1904), on crossing waltzing with normal mice, found 
in the Fg generation 458 normals and 97 waltzers (total 555). The 


TABLE 3.5 

Diet 

Males Females 

Total Young Percentage Males 

Vitamin B Deficient 

123 153 

276 

44-57 

Vitamin B Sufficient 

145 150 

295 

49*15 

Totals 

268 303 

571 



Mendelian expectation for / is 416 normals with a standard error of 





10*2. 


The difference between the actual and expected number of normal 
mice is 42, and being 4- 1 times the standard error is significant; it 
could only arise from random sampling four times in 100 000 trials. 

Parkes and Drummond (1925) give the data of Table 3.5, showii>g 
the effect of vitamin B deficiency on the sex-ratio of the offspri ig 
of rats. 

In comparing the sex-ratios, we are not comparing an experimental 
ratio with a theoretical one, but two experimental ones. Hence, we 
do not know the values oip in the infinite population required for the 
calculation of the standard errors; we will use the sample values as 
an approximation. The standard errors of p are 



0-445 7 — 0-198 6 


276 


d: o • 029 9 
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0-491 5 — 0-241 6 
29s 


= dz 0-029 I) 


and that of the difference is ± 0-041 7, or ±4*17 per cent. The 
difference in sex-ratio is 4-58, and being only i - 10 times its standard 
error is insignificant; it would arise from random errors nearly 
27 times in 100 trials. 


3 . 62 . When the mean number of failures (m) per set is large, 
the Poisson distribution also approaches the normal form. Then 

the standard error per set is a/ m, the standard error of the mean 

of N sets is V w/W, and that of the total number of failures in 
N sets, mN is 

% 

Thus for the distribution of yeast cells of Table 2.4, the total number 
of cells counted is i 872, with a standard error 

± Vi 872 = ± 43-3; 


the mean number of cells per square is 4- 68, and its standard error is 



± o-io8. 


Measures of Dispersion 

3 . 53 . The standard deviation estimated from the variance has a 
n'andard error of 


j VzN 

wl^ere a is the standard deviation of the population and N is the 
siz^ of the sample. Its distribution is not normal, but for large samples 
it lapproaches normality; e.g. at AT = 100, = 0-005 ^ 

= g-oooo. Let us see if the Scotsmen of Table 3.1 are really 
mare regular in height, as well as being taller on the average, than 
the| Englishmen. The standard error of a standard deviation is 



RANDOM SAMPLING AND STATISTICAL INFERENCE 85 

ij's/z times that of a mean, so for the difference of standard devia- 
tions of the table it is 


I 

— X 0*075 9 = ± 0*053 7 - 

V 2 


The difference (2*548 — 2*480 = o*o68) being only 1*27 times 
its standard error is not significant; in fact, random sampling would 
give as big a difference about one trial in five (P = 0*2). 

Only when found from the second moment has the standard 
deviation the above standard error. We shall now deal with its 
standard error when estimated from the range. E. S. Pearson (1926 
and 1932) has worked out the sampling distribution of the range 
fairly fully, and although for no size of sub-sample is it normal, it 
is only moderately skew for those of about 10; for sub-samples 
of 10, = + 0*156 and ^2= 3*^2. In such circumstances, we 

naay use the standard error as an approximate description of the 
sampling errors. Pearson gives the standard error or deviation of 
the range for several sub-samples, and for those of 10 it is 0-797 
where a is the true standard deviation. Suppose the sample of N 
contains m sub-samples of 10, so that N — 10 m. Then from sec- 
tion 2*71, 


Standard Error of Mean Range = 


o - 797 a 
‘V/ m 


We also have from Table 1.2, 


whence 


Estimated Standard Deviation ^ 


Mean Range 
3*078 


Standard Error of Estimated Standard Deviation 

Standard Error of Mean Range o * 797 a _ i ■ 1 58 a 

3*078 


This may be compared with the standard error of the estimate 
obtained from the variance. 

In large samples, the standard error of the mean deviation is 

g 

i*o68 — 7= = o-86i 

VzN 
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where S is the population value ’of the mean deviation. It follows 
that the standard error of the standard deviation estimated from the 
mean deviation is 


1-253 X 0‘86i 


cr 




I -068 


a 


VzN 


Constants of Shape 

3 . 54 . The j8 ratios are useful for testing the departure of any data 
from normality, but their distributions in samples drawn from an 
infinite population of that form have not yet been worked out. The 
first four moments of the sampling distributions have been derived 
by Fisher (1930^2), and from approximate values E. S. Pearson (1930) 
has determined appropriate empirical frequency curves of K. Pear- 
son’s system. Curves found in such a way usually give good approxi- 
mations to the actual distributions. 

To test asymmetry, yi = (with the same sign as fju^) is used. 
For sample^ N from the normal population, this has a standard 

error of V 6/iV (to a first approximation), and the distribution itself 
is so nearly normal that a deviation of twice the standard error lies 
practically on the 0*05 level of significance. 

The standard error of ^2 is V" 24/72 (to a first approximation), 
but the distribution's so skew, and this approximation is so poor, 
that it does not give a reliable test. Pearson’s tables {loc, cit) give 
for samples from the normal population, values of ^2 
side of the population value (3 • o) which cut off tails of 5 and i per 
cent, of the whole curve; these tables should be used. From the 
discussion in section 3.41, it is suggested that the 5 per cent, level 
of significance should be taken as lying between the 5 and i per 
cent, values. 

For the distribution of heights of Table 1.5, we have 

'll 

> yi = — - o-ii6 ± 0-074 6, ^82 = 2*908 and 72=1078. 

y 3, is less than twice its standard error, and so is insignificant, \yhile 
firom Pearson’s table, when n = x 000, the lower value of ^2 with a 
“ tail” of o • 05 is 2 • 76, and since the value above is nearer 3 than that, 
may conclude that the data are, as far as we can tell, from a normal 
population. * 
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Standard Errors of Functions of Statistical Constants 

3.55. Sometimes it is desired to calculate the standard errors of 
functions of one or more statistical constants, the errors of the 
individual constants being known. We have already had an example 
in the difference between two means ; the ratio and product of two 
means, and the square of the standard deviation are other examples. 
We shall describe here an approximate method that is applicable 
to large samples. Where there are more than one constant we shall 
assume they are obtained from independent samples except in one- 
instance. 

^ Let the population values of the constants be a and j8, let the 
function be 

^ = /(». 

and let the corresponding sample values for any one pair of samples 
be Z, a and where 

/ = A + 8A, a = a + Sa and 6 = j8 + 8^8, 

S is used as a sign meaning a deviation. Then for the pair of samples 

I =/(a + 8a, jS + SjS). 

It can be shown that, for the purposes of deducing standard errors 
and variances, the relations between 8A, 8a and S/S may be approxL 
mately derived by regarding them as mathematical differentials, so 
that 

SA = ^Sa + 
doc dp 

The degree of approximation involves neglecting quantities of the 
order x/N compared with unity, where N is the number in the 
sample. 

The mean values of the squares of 8A, 8a and 8j8 for all possible 
pairs of samples from the population are the squares of the corre- 
sponding standard errors and may be written {SE^^y 
(SEfy)^. Hfence, by squaring the terms of the above equation and 
finding the means for all pairs of samples, 

w = + Ci) W + U 

where [8a 8^8] is the mean value of the product of the deviations 
Sa and Sj3 for all pairs of samples. If the two samples in the pair 
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are independent, this mean product is zero, as will be shown in 
Chapter VII. Hence, for independent pairs of samples, 

• • ( 3 - 6 ) 

This equation may easily be extended to three or more statistical 
constants. 

When the function is the difference between two means, equa- 
tion (3.1), section 3.31, follows from (3.6) directly. 

By way of example, let us deduce the standard error of the 
variance of a sample in terms of that of the standard deviation s, 
so that I = a = s and there is no Let the population value of 
the standard deviation be a; then 

V = 

{SE^f = 

As another example, let / be the ratio between two means x and y, 
with corresponding population values of A, | and 7^. Then 




The standard errors of x mdy have already been given in section 3.71. 

It has been shown in section 3,73 that the mean and standard 
deviation estimated from the same sample are independent, and it 
follows from this that equation (3.6) may be used to determine the 
standard error of the coefEcient of variation. If 100 fc, ^ and a are 
respectively the population values of the coefficient of variation, 
mean and standard deviation, corresponding to A, a and j8, and 
k is the sample estimate of k, it is easy to see from equation (3.6) 
and the standard errors given in sections 3.71 and 3.53 that the 
standard error of the coefficient of variation is 

2 k ^ + I 

zN ‘ 

After reading Chapter VII, readers will be able to evaluate the product 
teiia and so deal with constants that are not independent. 
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Determination of Population Value from Sample 

3 . 6 . We have seen how a statistical constant determined from a 
sample may be used to test some hypothesis concerning the popu- 
lation value. Sometimes, however, we are unable to postulate the 
population value, of which we look to the sample to teach us some- 
thing. This process of estimating the population value from the 
sample or of arguing from the particular to the general is the 



O 7T3 TT 2 TTji I 


Population Value 
Fig. 7. 

inductive method, and the question arises as to the accuracy of the 
estimate given by the sample. This may be answered if the sampling 
distribution of the constant is known, and we shall now illustrate 
the kind of answer provided by referring to samples of 100 indi- 
viduals taken to measure the proportion having a given character, 
i.e. the proportion of successes. 

The population value of this proportion may be denoted by tt* 
and any sample value by p. Then a diagram like that of Fig. 7 may 
be drawn (this is not drawn to scale), where for any population 

* Here vr is not 3 • 14159 . . - 
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value 'jTi the point B for which p — represents the mean of all 
the sample values, and the points A and C represent values of p 
lying on the 5 per cent, level of significance. These points may be 
found from the sampling distribution of p^ which is given by the 
terms of the binomial (tt^ + i — and if we may assume this 

to be approximately normal, 

AB = BC == Twice the Standard Error = 2^/ 7 ri(i — 77 1) “ 100. 

For example, if = o-8, jB is at^ = 0*8, and the standard error 

is Vo* 8 X 0*2 100 = 0*04; hence A is at^ = o-8 + o-o8 
= 0-88 and C is at j!) == o*8 — o-o8 = 0*72. In this way, points 
corresponding to A and C may be found for all possible values of tt, 
and these lie on the curved lines shown diagrammatically in Fig. 7. 
The points corresponding to B lie on the line jp = tt. Then we 
know that for any one value of tt, 95 per cent, of the sample values^ 
lie within the limits represented by the points on the curved lines 
at which an ordinate drawn at tt cuts them, and so by adding the 
experience for all populations, we know that 95 per cent, of the 
sample values lie within the limits of the curved lines, whatever may 
be the relative frequencies with which the different values of tt occur. 

If in making inferences from samples it is assumed that all 
sample-population points lie within the limits, i.e. that for every 
sample value pQ (say) the population value lies between 773 and 77^, 
we shall be right in 95 per cent, of the inferences. In this way, it 
is possible to make an estimate from a sample of the range within 
which the population value lies, with a given long-run risk of being 
wrong, 

Fisher (19306) has termed the limits represented by the curved 
lines the fiducial limits and the corresponding proportion of correct 
inferences (95 per cent, in Fig. 7) the fiducial probability. Neyman 
(1934) uses the terms confidence limits and confidence coefficient. It is 
possible, of course, to determine limits corresponding to other 
fiducial probabilities, e.g. for the 99 per cent, level, and for any 
. statistical constant that has a known continuous sampling distribu- 
tion. In order to determine the sampling distribution of the 
constant, it is usually necessary to assume the form of the popula- 
tion, e.g. normal, and it may be necessary to assume the standard 
deviation. When the distributioi\ is skew, difficulties of the kind 
mentioned in section 3.41 have to be faced,. 

In deriving the limits in Fig. 7 by the methods outlined, the 
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result would be limits of p for known values of tt; for practical 
purposes it is more convenient to have limits of tt for known equi- 
distant values of It is not proposed here to go into the method 
of arriving at this desired form of expression (simple inverse inter- 
polation is effective), but it is pointed out that the limits of tt 

corresponding to a sample value p are not p d: 2 ^/ p{i — p) ^ 100. 
For example, when p^ = 0*2, — 0*291 and 773 =,0-132. It will 

be noted that the limits of tt are not even equidistant from^. These 
conditions arise when the standard error of a statistical constant 
depends on the population value of the constant as for the mean 
of a binomial or Poisson series. When this is not so, and the standard 
error is independent of the constant, the 95 per cent, fiducial limits 
of the population value are two parallel straight lines, and for any- 
one sample value are at twice the standard error above and below 
that value, assuming a normal sampling distribution. Thus the 
95 per cent, limits of the mean height of Englishmen in Table 3.1 
are 67-437 5 ih ^ X 0-032 38, i.e. 67-502 3 and 67-3727 inches. 

Fiducial probability is not the same as inverse probability ^ although 
is appears superficially to be so. An inverse probability statement 
would be applied to any particular instance and would be of the 
form: ‘'given a sample value Ao> probability of the population 
value lying between 773 and 773 is 0-95’’ (see Fig. 7). This involves 
the conception of a super-population of population values corre- 
sponding to the one sample value p^y 95 per cent, of which are 
between 773 and 773. Such a conception lies behind any statistical 
interpretation of inverse probability and presents many difSculties, 
one of the chief of which is that there is no basis for making any 
assumption regarding the distribution in the super-population. The 
fiducial probability statement says that for all possible sample values 
taken from any populations, 95 per cent, of the population values 
of 77 lie between the limits shown in Fig. 7 ; no assumption is made 
as to the distribution of the population values. 

Fiducial limits can only be calculated in the way shown in so far 
as the sampling distribution is continuous or may be approximately 
represented by a continuous distribution. 

Choice of Statistical Constants 

8 . 7 . We have seen in Chapter I that there may be several statistical 
constants for measuring one characteristic of a sample or popula- 

i 


i 
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tion; for example, the standard deviation, mean deviation, quartile 
deviation and mean range are all measures of dispersion. The 
question arises: Are there any grounds on which one constant is 
to be preferred to another? Fisher’s theory of estimation (1922, 
igz^a and 1936) provides an answer, and we shall give as much of 
it as seems necessary to show the kind of answer provided. 

So far, we have regarded statistical constants as convenient 
measures of various characteristics of samples and populations, and 
have assumed some mathematical form of distribution for the popu- 
lation more or less incidentally. In the theory of estimation it is 
necessary to assume or specify some mathematical form of distribu- 
tion in the population as a starting-point. The equation of this 
form has one or more constants or parameters that may vary from 
one population of that form to another, and the particular values 
of which define any given population. For example, if the assumed 
form of the population is the normal curve described by equa- 
tion (2.4), the two parameters are m and a, and given values of 
these define a particular normal population. On this view, m and o* 
are not thought of as the mean and standard deviation, i.e. as 
measures of position and dispersion ; they are thought of as mathe- 
matical parameters. Similarly, statistical constants (Fisher calls them 
statistics) calculated from samples are not descriptive averages ; they 
are estimates of the parameters in the equation, and it is on this 
basis that the relative values of equivalent statistics are judged. 
Here, we need concern ourselves with two only of the criteria given 
by Fisher; they are consistency and efficiency. 

Consistency . — ^A consistent estimate of a parameter must equal the 
parameter when it is derived from the whole distribution in the 
population. In a normal population, all the above measures are 
consistent estimates of cr. This criterion is an obvious one and is 
satisfied by all statistic^ estimates in common use. 

Efficiency . — This criterion applies to those statistical estimates 
from large samples for which the sampling distribution is known 
and is normal, so that it is completely described by the mean and 
standard error.* The mean of the estimate in the sampling distri- 
bution may differ from the parameter, but this difference may be 
calculated and allowed for as bias. The standard error defines the 

* As an approximation, the ideas and criteria may be applied where the 
sampling distribution is nearly normal, i.e. to all constants the standard 
errors of which are given in this chapter, except ^ 2 ^ 
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inaccuracy of the estimate ; here it will be more convenient to deal 
with the square of the standard error, viz. with the variance of the 
estimate. 

For statistics of the class with which we are now dealing the 
variance in samples of N may be expressed in the form 

I 

In 

where I depends on the statistic and is called its intrinsic accuracy. 
For example, the intrinsic accuracy of the mean as an estimate of m 
in the normal population is ija^ and that of the square root of the 
second moment as an estimate of a is 

We have seen in section 3.53 that the various estimates of the 
standard deviation have dijffering standard errors, and hence intrinsic 
accuracies. Moreover, an increase in I has the same influence on 
the sampling variance as a proportionate increase in JV, so that 
differences between intrinsic accuracies may be expressed in terms 
of N. Thus, for cr estimated from the second moment, I = 2/cr^, 
and for the estimate obtained from the mean range in sub-samples 
of 10, / = 2/1-34 0-2. The variance of the estimate from the range 
in a sample of 100 is equal to that of the estimate from the second 
moment in a sample of 100/1-34 = 75, for 

cr^ 1.34 
2 X 75 2 X 100 

The loss in accuracy due to using the range instead of the second 
moment is equivalent to a 25 per cent, reduction in the size of the 
sample. We may say that the estimate from the range has an efficiency 
of 75 per cent, of the estimate from the second moment. 

For each parameter, there is a class of |tatistics that have the 
same intrinsic accuracy, that is greater than the intrinsic accuracy 
of all other estimates, and these are called efficient statistics. The 
ratio of the intrinsic accuracy of any given statistic to that of an 
efficient one is called the efficiency of the given statistic. The mean 
and standard deviation as defined in sections 1.22 and 1,23 are 
efficient estimates of m and a in the normal population. 

When using one estimate of a parameter, we may regard the 
sample as giving a certain quantity of information regarding the 
parameter, that increases in proportion to the size of the sample. 
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Each individual may be regarded as contributing a unit amount of 
information, and the connection between the intrinsic accuracy and 
size of sample suggests that the intrinsic accuracy should be the 
unit. Fisher in his later writings (1935) has referred to intrinsic 
accuracy as the quantity of information given hy a single observation. 
In these terms, the mean range in sub- samples of 10 may be said 
to give three-quarters of the amount of information given by the 
second moment regarding the parameter a. Just as for a given 
estimate the number of individuals is additive in the sense that two 
independent samples of and give the same amount of infor- 
mation as one sample of N-^ -)- iV’2, so for different estimates and 
samples, independent quantities of information are additive. 

When combining estimates of a parameter that have normal 
sampling distributions, the estimates being from different samples, 
we may take a weighted mean, using the quantities of information 
as weights. This is analogous to combining estimates of the mean, 
say, by using the numbers in the samples as weights. The variance 
of the combined estimate is the inverse of the quantity of informa- 
tion, This result is used in sections 8.z2 and 11.73. 

There are several reasons why efficient statistics should be used 
except in special circumstances. The use of an inefficient statistic 
is equivalent to throwing away part of the data, and such a course 
is usually only justifiable if there is a proportionate saving in labour, 
say in computation. When efficient statistics are used, mistaken 
inferences of the second kind mentioned in section 3.3 are reduced 
to a minimum. Further, populations are estimated with maximum 
precision by the use of efficient statistics, for these have fiducial 
limits corresponding to a given fiducial probability, that are closer 
than for any other estimates of the same parameter. The methods 
of the next chapter apply only when efficient estimates are used. 

A distinction must here be made between different statistical 
constants that are merely mathematical transformations of one 
constant, and constants that are essentially different estimates and 
are only related statistically. For example, the variance and standard 
deviation are mathematically equivalent; samples that have the 
same variance must necessarily have the same standard deviapon 
and the two constants are equally efficient. On thfe other hand, the 
standard deviation, mean deviation, and mean range in sets of 10 
are only statistically related in that the relations given in section 1.23 
are true only on the average and do not necessarily apply to any 
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« 

’ particular sample; diiferent samples having the same standard 
deviation may have different mean deviations and mean ranges. 
These estimates do not have the same efficiency. 


Method of Maximum Likelihood 

3 . 8 . This is a general method for arriving at estimates of parameters 
of distributions, and Fisher has shown that estimates given by this 
method are efficient, provided efficient statistics of the parameter 
exist. 

We have already seen in section 2.72 how the probability of a 
given sample may be deduced when the population and its para- 
meters are known. When the form of population is known the 
expression for that probability (without the differential terms) for 
any assumed values of the parameters is termed the likelihood of 
the assumed values. The particular values that make this likelihood 
a maximum are the maximum likelihood estimates. They are obtained 
by differentiating the logarithm of the likelihood with respect to 
the parameters, equating to zero and solving the equations. 

For the normal distribution with assumed parameters ^ = x and 
o- = 5 * and a sample of N, the logarithm of the likelihood (to the 
base c) is : 



N. 

— log 277 
2 


Nlogs 


i( (x^ — -xy 
2 \ 




If this is differentiated with respect to x and the differential equated 
to zero, it is easily deduced that 


-f *2 + 
X = 


x^ 


N 


• I ♦ 

Thus, the mean of the sample, as defined in section 1.22, is the 
maximum likelihood estimate of Similarly, by differentiating 
L in (3.7) with respect to s and equating to zero, the maximum 
likelihood estimate of a is found to be the [standard deviation as 
defined in section 1.23. 


* The Latin letters denote estimates from the sample and the sign A 
denotes the particular kind of estimate dealt with in this section. 
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3.81. When the data are in the form of a frequency distribution' 
with frequencies oi n-^n^ , . . n, , , . m. the groups, the loga- 
rithm of the likelihood is 



where S is the summation over all groups and is the estimated 

S 


frequency in the rth group, determined from the equation for the 
assumed population and expressed in terms of the unknown para- 
meters. 

For example, for the Poisson distribution the proportion of fre- 
quency with s successes is 





If m is the maximum likelihood estimate of ju,, and the total number 
in the sample is iV, 




By equating to zero the first diJEferential of this with respect to m 
we have 


Ssn, 

s ^ 




Thus, the efficient estimate of //, is the mean of the distribution. 
This result is not without interest, since we have seen in section 2.3 
that the second moment of the distribution is also a possible esti- 
mate of ju,, and hitherto no rational grounds have been advanced 
for regarding it as an inferior estimate. 

The method of maximum likelihood may be used whenever a 
distribution can be expressed in terms of parameters that are required 
to be estimated from the sample. Fisher (1936a) has used it in 
genetical studies for estimating linkages. 


3.82. When an efficient statistic has a normal sampling distribution, 
the square of the standard error or variance in a large sample may 
be obtained from the second differential of the logarithm of the 
likelihood L. Generally, if k is the parameter, k the maximum 
likelihood estimate, and 0 % is the standard error of 

I 



* 


• (3-9) 





random sampling and statistical inference 97 

where the square brackets [] signify “the mean for all possible 
sample values of the term contained therein/’ 

For the normal distribution, on differentiating equation (3.7) 
twice, we find 

_ N 

and the mean value of for all possible samples is the population 
value whence 



This is an alternative derivation of the square of the standard error 
of the mean. 
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4 , 1 . So far we have been using as expressions of differences between 
distributions, constants like the mean, standard deviation, jS^ and jSg, 
which summarise their chief properties or are estimates of their 
parameters. Except for populations of particular forms, these do not 
express all the features of the distributions. It is desirable to have 
some index which measures the degrees of difference between the 
actual frequencies in the groups, and so compares all the essential 
features. Such is K. Pearson's (1900) which we shall first use to 
measure the deviations of an experimental distribution from the form 
of some hypothetical population. Both must be grouped in the same 
way, and the theoretical distribution must be adjusted to give the 
same total frequency; then if is the number of observations in any 
one group in the theoretical distribution, and is the corresponding 
number in the experimental one, 



where S is the summation over all groups. It will be appreciated 

that since is squared, all differences in frequency, whether 

positive or negative, add a positive amount to and further that the 
greater these differences are, the greater is ; if the two distributions 
are exactly alike, is zero. 


Sampling Distribution of 

4 . 2 . In using to test whether one distribution differs from the 
other, we must remember that because of random errors, x^ will 
never (or hardly ever) be zero, and we need to know its sampling 
distribution so that we can tell the probability of an observed x^ 
being given by a random sample from the hypothetical population. 
As usual, if that probability (which is symbolised by F) is low enough, 
the x^ is said to be significant, and it is unreasonable to suppose that 
such a significant value could be a result of sampling errors alone. 
Subject io certain restrictions, the distribution of x® is given by the 
equation, 

df = .... (4.2) 
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where df is the element of frequency between ordinates drawn at 
X and X + frequency between x = ^ x = Xi being 

Ji>, ^ is a consent and is nmnber of dsjnsss of freedom.. 

-^0 

This conception of degrees of freedom is not altogether easy to 
attain, and we cannot attempt a full justification of it here ; but we 
shall show its reasonableness and shall illustrate it, hoping that as 
a result of familiarity with its use the reader will appreciate it. Here 
the number of degrees of freedom is the number of groups, modified. 
Clearly, since the error in each group in the experimental distribution 
contributes a positive amount to x^, the greater the number of groups 
the larger would x^ be expected to be, as a result of random variations 
alone, and accoimt of this is taken through the -quantity g. Further, 
it is often the practice to fit the theoretical distribution to the obser- 
vation^ by calculating constants from the sample, just as in the 
example of section 2.51 we fitted a normal curve by making' its 
mean, standard deviation and total equal to those of the sample. If 
we wish to test the adequacy of the theoretical form, further account 
must be taken of the degree to which we have made it fit the obser- 
vations by this method. Suppose, in an extreme case, there were 
g' groups and we fitted a curve involving g' constants which were 
calculated from the data; then the two distributions would agree 
' exactly and x^ would be zero because sampling errors would have 
had no play. To take account of this second factor, we must subtract 
from the number of groups (say g') the number of constants that 
have been determined from the data in fitting,* in order to obtain 
the degrees of freedom (g). Every constant so determined has the 
effect, from the point of view of x^, of reducing the number of groups 
by one, and the number of degrees of freedom may be regarded 
effectively as the number of independent groups remaining to con- 
tribute to x^* When only the totals have been made equal, g = g' — i , 
but in a case like the fitting of a binomial to the number of germinating 
seeds in Table 2.3, the theoretical distribution has been adjusted to 
make its mean and total both equal to the sample, and ^ — 2. 

One restriction under which equation (4.2) for the sampling dis- 
tribution of x^ applies is that no group contains very few indi- 
viduals; 10 is about the lower limit. To arrive at the distribution 

# Provided these constants are determined by processes whicdi satisfy the 
criterion of efficiency. 
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of ^ experimentally, we should have to take many samples. Now the 
probability of any random individual belonging to the ^th group 
(say) is vJiV, where N is the total number in the sample, and conse- 
quently in all the samples, the actual numbers falling into the sth 
group {fis) will vary according to the binomial 




N 


(this is a direct application of the binomial theory given in section 
2.^). If Vjj is not too small, this binomial approximates to the normal 
distribution, and it is on the assumption that is distributed 
normally that equation (4.2) is obtained. This is the reason why in 
using ^0 group should contain much fewer than ten individuals, 
and to satisfy this in practice the tail groups, which often have very 
low frequencies, are combined. 

A second restriction is the usual one that all the individuals in 
the sample must be independent. Indeed, x^ ^riay be used as a test 
of independence. 

In its sampling distribution, x^ ^^7 vary between zero and plus 
infinity, and since there is no question of negative deviations, and 
we are asking if the observed x^ is really greater than zero, the 
value at which an ordinate cuts off a tail of 5 per cent, also lies on the 
5 per cent, level of significance. Since, in general practice, we should 
not test a very small value of x^ for the significance of its deviation 
from the average, we do not include the tail near = o in the test 
for significance; this is in accordance with the principles discussed 
in section 3.32. 

There are two sets of tables of the probability integral of equation 
(4.2). The first, calculated by Elderton, is included in Pearson’s 
Tables for Statisticians and Biometricians ^ and gives the areas of 
the tails beyond integral values of x^ for different values of n', where 
n' is one more than the number of degrees of freedom (= ^ + i). 
This was originally calculated for the case where the theoretical 
distribution is only made to agree with the data in respect of its total, 
and was wrongly applied to other cases for some tirhe before the 
necessity for correcting to degrees of freedom (which was pointed out 
by Fisher in 19226) was realised. The other tables by Fisher (1936a) 
give values of x^ lyfog ori different levels of significance, and in these 
the degrees of freedom are used directly and are called n. 

The x^ is more universal in its application than most others 



GOODNESS OF FIT AND CONTINGENCY TABLES loi 


in that there is no assumption made as to the normality of the 
distributions being compared. 

First let us test the fit of the normal curve to the distribution 
of heights of men in Table 1.5; the arithmetical operations are set 
out in Table 4.1, and it will be noticed that the tail groups have 
been lumped together, giving 14 groups. Three constants have been 
fitted (total, mean and standard deviation), leaving ii degrees of 


TABLE 4.1 


Frequencies 


{ns — Vs) 

(«s - 

Stature in Inches 

K — Vs) 

Observed Expected 

/ \ / \ 

Vs 

Vs 


(Ws) (Vs) 


below 61*5 

14*5 

II *8 

2*7 

0*229 

0*62 

61-5- 

17 

17-7 

— 0*7 

— 0*040 

0*03 

62*5- 

33-5 

35*5 

~ 2*0 

— 0*056 

0*11 

63-5- 

61 *5 

62*8 

- 1*3 

— 0*021 

0*03 

64-5- 

95*5 

96-7 

— 1*2 

— 0*012 

0*01 

65 ‘ 5 - 

142 

130*1 

11*9 

0*091 

I *08 

66*5-- 

137-5 

153*0 

- 15*5 

— 0*101 

• 1-57 

67-5- 

154 

157-1 

— 3*1 

0*020 

0*06 

68-5- 

141-5 

141 *0 

0*5 

0*004 

0*00 

69-5- 

116 

110*5 

5*5 

0*050 

0*27 

70-5- 

78 

75*7 

2*3 

0*030 

0*07 

71-5- 

49 

45*2 

3-8 

0*084 

0*32 

72 -s- 

28-5 

23-7 

4 * 8 

0*203 

0-97 

over 73 * 5 

9*5 

17*2 

- 7'7 

— 0*448 

3-45 

Total 

I 078 

I 078*0 

0*0 

— 

X“== 8 -S 9 






n'=i2 






11 

0 


freedom, and we wish to see if the (8 ‘59) resulting from those 
II can be attributed to random variations. Entering Elderton’s table 
at n' — iz, we find P = 0*66, and this is so large that we might 
reasonably suppose the deviations to have arisen from errors of 
random sampling, and we say that the normal curve gives a good fit. 
Indeed, 66 random samples in 100 would have given a ^ equal to or 
greater than 8-59. In computing ^ the total of the fourth column 
(= o) checks the accuracy of the subtractions ; and the terms in the 
sixth column are the products of those in the fourth and fifth. 
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Similarly, the distribution of germinating seed in Table 2.3 gives 
a of 2*897 when the groups containing four or more seeds are 
combined ; there are five groups and three degrees of freedom (the 
mean and total have been equalised), and entering Elderton’s table 
at n' = 4, we find P == o -41 , showing that the fit is a good one. 

An experimental distribution of flower colours and the Men- 
delian expectations are shown in Table 4.2 (Wheldale, 1907). 
There are 7 groups, and as only the total has been found from the 
sample (the expected frequencies, being determined from Mendelian 


TABLE 4.2 


Colour of Flowers 


Expected 

Frequency 

Experimental 

Frequency 

Magenta 


h8‘34 

107 

Magenta delila 

« • 

39*44 

42 

Ivory 

• • 

5^*59 

67 

Crimson 

■ « 

39'44 

42 

Yellow 


17-53 


Crimson delila 


13*15 

12 

White 

• • 

93-51 

80 

Total* 

• • 

374*00 

374 


theory, do not depend on the experimental data in any other way), 
there are 6 degrees of freedom (n' == 7); ^.gj p ^ q. . 
hence the deviations of experiment from expectation cannot be 
regarded as significant. 

The Additive Nature of 

4 . 3 . A useful property of is that several values can be added 
together, and if the degrees of freedom are also added, the total 
deviations of several distributions can be tested by entering the 
tables at the total degrees of freedom and finding the probability 
corresponding to the total x^- For instance, for the three distributions 
we have just tested, the total x^ is 21 • 30, the total degrees of freedom 
are 20, and P for this value of x^ and w' = 21 is o • 38 ; thus, taken alto- 

* The figures given in the paper add up to 373*98, and we have added 
0*01 of a unit to each of the largest groups to make the total correct. There 
are also two groups with expected frequencies of zero, but these must be 
left out in this test. 
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gether, the differences are still attributable to random errors. This is 
the correct way of combining the experiences of several distributions, 
and it is quite wrong to find an average P or 

To illustrate , further this additive character of Table 4.3 has 
been compiled from some data of Mendel, quoted by Bateson (1913), 
and gives the numbers of round and angular peas from ten plants. 


TABLE 4.3 
Frequencies of Peas 


Plant Number 

Round 

Angular 

nt 

Total 

N 

Ratio 

nJN 


I 

45 

12 

57 

0-789 5 

0-47 

2 

27 

8 

35 

0-771 4 

0 *09 

3 

24 

7 

31 

0 * 774 2 

o- 10 

4 

19 

10 

29 

0-655 2 

1*39 

5 

32 

II 

43 

0-7442 

0-00 

6 

26 

6 

32 

o-8i2 5 

0*67 

7 

88 

24 

112 

0-785 7 

0-76 

8 

22 

10 

32 

0*687 5 

0*67 

9 

28 

6 

34 

0-823 5 

0-98 

10 

35 

7 

32 

0-781 2 

0-17 

Total . . 
Expected 

336 

327-75 

lOI 

109-25 

437 

437-00 

— 

— 


together with the ratios of the numbers of round peas to the totals. 
These ratios vary considerably, and we will see how far the varia- 
tions may be explained on the hypothesis that the peas from the 
ten plants are effectively ten random samples from an infinite popu- 
lation in which the ratio is 0-75 (this is the Mendelian expectation). 
We might calculate for each plant the quantity 

Deviation in Ratio from 0 • 75 
Standard Error 

and since the sampling distribution of each ratio is binomial, this 




0-75 



(i - 0-75) 


0-75 

N 


I 



104 


METHODS OF STATISTICS 


If the deviations were ^random, these ratios would be distributed ap- 
proximately normally with unit standard deviation, and none of the 
.ten would be expected to exceed 2*0. Alternatively we may calculate 
ten values of ^nd since there are two frequency groups, each 
plant contributes one degree of freedom. In this simple case, 
happens to be the square of the above ratio. 

Fisher gives the value of on. the 0*05 level of significance 

as 3 • 841 for one degree of freedom, and as none of those in Table 
4.3 is as large as this, no individual plant differs significantly from 
expectation. Further, for all plants combined, the proportion of 
round peas is 0*768 9, giving a oi 0*96, which also is insignificant. 
It may be, however, that the variability in the proportion of round 
peas from plant to plant is greater than can be explained by random 
errors, when all are considered together. To make this clear, we may 
imagine an extreme case in which the values of for all the plants are 
near the level of significance, but the ratio nJN varies above and below 
expectation, so that the ratio for all plants combined is near expec- 
tation. Then, although no individual plant appears to differ signifi- 
cantly from expectation, it is unlikely that all would be so near the 
level of significance if it were not that the deviations as a whole were 
real, but in different directions, so that when added they average 
out. To test such a point we may add the values of x^> giving a total 
of 5*30 for 10 degrees of freedom {n' = ii) so that P — 0*87, and 
we conclude from the data that the plants do not vary significantly, 
and they may be regarded as so many random samples of peas. 
Had there been enough plants, instead of adding the values of x^ 
we could have formed a frequency distribution of them, and have 
compared it with the theoretical form for one degree of freedom. 

The x^ test is thus an extension of the ordinary method of finding 
the significance of a single binomial mean, and enables the information 
given by a number of them to be combined. 

4 . 31 . It follows, of course, that if a number of values of and their 
degrees of freedom may be added and treated as one, a total x^ 
may be split up into parts, and each part may be tested separately. 

Contingency Tables 

4 . 4 . The problem of the sex-ratio of rats (in Table 3.5, p. 83) may 
be looked at in a different way. We are not concerned with the total 
numbers of males, of females, or of young resulting from the two 











GOODNESS OF FIT AND CONTINGENCY TABLES ! ^05 

diets, but with the distribution of young in. the four cells giving ^ 
two sex-ratios. If diet has had no effect on the sex-ratio, the 57*1 ^ 
observations would be expected to be distributed at random in the 
four cells, with the one restriction that they should add up to give 
the totals of the table. In the infinite population of tables with those 
totals, the probability of an observation falling in the “deficient” 
group is 276/571, and that of it falling in the male group is 268/571, 
so that the probability of it falling in the “deficient” male square is 

268 276 

— X 

571 571 




and the expected number of individuals in that square is that proba- 
bility multiplied by 571 = 129-54. Similarly, the other squares can 


TABLE 4.4 
Expected Frequencies 


Diet 


Males 

Females 

Total 

Vitamin B deficient . . 

• • 

129*54 

146*46 

276*00 

Vitamin B sufficient . . 

• • 

138-46 

156-54 

295*00 

Total 

• • 

268*00 

303*00 

571*00 


be filled as in Table 4.4; they are the frequencies that would be 
expected if sex were independent of diet and the individuals were 
distributed at random. We may now test to see if Table 3.5 differs 
significantly from Table 4.4 by finding for the four cells, and then 
the P for one degree of freedom. There is only one degree of freedom, 
since only one cell can be filled independently; the numbers in the 
others can be obtained from that one and the totals. In our example, 
^2 1-20 and P = 0-28 (from Fisher’s table); the deviations are 

not greater than can be attributed to random errors. It will be noticed 
, that this probability is practically the same as that obtained previously 

from the standard error of the ratios (0*27) ; indeed, it should be, for 
^ both methods are equivalent, being based on the assumption that 
^]^he number in a cell (or its ratio to the total) is distributed hormally. 
^ Table 3.5 is an example of a fourfold contingency table. When the 
T" individuals in a sample have two characters, and a frequency table 
is made classifying them according to both so as to show the relation 


m 
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between the characters, the result is a contingency table. In Table 3.5, 
the two characters are sex and vitamin B content of diet, and complete 
information regarding these for any one rat is given by its position 
in the table. If the numbers in the cells are not randomly distributed, 
the two characters are said to be associated'^ the sex of the offspring 
in our example is not associated with the vitamin B content of the 
diet of the parent. Contingency tables may be manifold, and if there 

TABLE 4.5 


Severity of Attack 




Hsemor- 
. rhagic 

Confluent 

Abundant 

Sparse 

Very 

Sparse 

Totals 

r 

o-io 

, 

I 

6 

II 

12 

30 



(0 • 87) 

(5-13) 

(9*02) 

(8 • 60) 

(6-38) 

Years 

10-25 

5 

37 

114 

165 

136 

457 


(13-25) 

(78-20) (137-45) (130-96) 

(97-14) 


since 

25-45 

29 

155 

299 

268 

181 

932 

vaccina- 

tion 


(37-04) (159-47) (280-32) (367-07) (198-10) 


over 45 

II 

35 

48 

33 

28 

155 



(4 - so) 

(26*52) 

(46 * 62) 

(44*42) 

(32 - 94) 


Unvaccinated . . 

4 

61 

41 

7 

2 

115 



(3 - 34) 

(19*68) 

(34-59) 

(32-95) 

(24-44) 


Total 

• • 

49 

289 

00 

0 


359 

1689 


are n rows and m columns, there are (n — i){m — i) degrees of 
freedom. 

Table 4.5 is a 5 X 5 contingency table, being Brownlee’s data of 
the severity of smallpox attack and degree of vaccination (quoted 
by Pearson, 1910); below the frequencies, the expectations for inde- 
pendence are given in brackets. 

As there are several expected frequencies less than ten in the first 
row and column, these have been combined with the second row and 
column in working out, so forming a 4 X 4 table. ^ 196’ 33, 
there are 9 degrees of freedom, and P is less than o-ooo 001 ; hence 
the association between degree of vaccination and severity of attack 
is overwhelmingly significant. Notice, however, that this gives no 
information as to the strength of association ; a smaller sample would 
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give a lower significance for the same association, and a larger sample 
would give a higher significance. Indeed, this test does not even teU 
us whether increased severity of attack is associated with a longer 
or a shorter period since vaccination. Having established, howcTCr 
that the effect is real, we glance at the table again, and notice that 
for the more recently vaccinated patients there tends to be a deficit 
of severe attacks and an excess of milder ones; for the remotely 
vaccinated, the reverse is the case, and thus we establish the sense 
of the association; a measure of its strength will be discussed in 
section 9 ‘ 5 ‘ Headers must be warned that association does not 
necessarily mean a causal relationship ; it only means that in the 
population sampled there is a tendency for variations in the 
two characters to occur together. A much closer analysis is 
necessary to investigate causes; this also will be discussed later 
in section 7.3. 

The test of Table 4*5 association may be looked at in another 
way. The table consists effectively of a number of frequency dis- 
tributions of years since vaccination, and ^ is the measure of the 
deviations of these distributions from hypothetical ones deduced 
from the “totals” column and differing only in total frequencies. 
The independent constants of these hypothetical distributions are 
four of the proportionate frequencies in the totals column ; the fifth 
may be obtained from the others, since they all add up to unity. 
In addition to these, the five totals in the last row give the five totals 
of the hypothetical distributions, so that altogether nine “constants” 
have been fitted, leaving 16 degrees for a 5 x 5 table. Alternatively, 
the table may be regarded as a collection of five distributions in 
which the variate is the severity of attack. 

For expressing and testing association, the contingency table is 
exceedingly useful, since the characters need only be described 
qualitatively, and need not be given in any particular order. A 
rearrangement of the rows or columns in Table 4.5 does not affect 
the result. 


4 . 41 . A useful application of the test is to the comparison of a 
number of frequency distributions, e.g. for the purpose of rlippking 
sampling technique. The separate distributions may be regarded as 
rows in a contingency table, and if the is large enough, they are 
significantly different. As a special case of this, when there are two 
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distributions and g* groups in each, there are g' — i degrees of 
freedom, and the expression for becomes 




X- 


S- 

s 




l«s + 





where and are the two totals (they need not be equal), 

1^5 and frequencies in one corresponding group in 

the two distributions, 

and S is the summation over all groups. 

The grouping must, of course, be the same for both distributions 
under comparison. These tests are alternative to those involving 
only the means and standard deviations, but are more complete, since 
they compare the distributions in all respects. 


General Notes on the Distribution of 

4 . 5 . The distribution of is of considerable general importance in 
statistical theory, and a proper understanding of its nature is desirable. 
We have introduced it as it first appeared historically, as a measure 
of the differences between two frequency distributions. More gener- 
ally, however, if we have any quantity x which is distributed normally 
about zero with unit standard deviation, and we add the squares of 
g independent values of x, the distribution of this sum in the infinite 
population is that of x^ for S degrees of freedom. 

Thus, the distribution of x^ is closely related to that of N times 
the variance, in samples of N from a normal population in which 
the standard deviation is unity. Indeed, the sum of squares of ^ inde- 
pendent deviations from the population mean is equivalent to ^ + i 
deviations from the sample mean, so that if in equation (2.8), p. 66, 
we write 


♦ 


JV — ^ + I, 



g+ I 


and a = I 


in the distribution of s, the equation reduces to the form of (4.2). 
Tables of the probability integral of x^ only go up to ^ = 30, but 

for larger values of g it is sufficient to assume %/ 2 x^ to be normally 
distributed about a mean of 's/zg — 1 with unit standard deviation 
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(Fisher, 1936a). Notice, however, that here we are asking not “is 
the observed V 2x® significantly different from the mean •*>/ zg — i ?” 
but rather “is the observed V 2%* significantly greater than zero?” 

and consequently the deviation of V 2x^ which has a tail 5 per cent, 
of the whole curve is equivalent to the 5 per cent, level of signifi- 
cance. Such a deviation is 1-65 times the standard error, and 

V 2x® — V zg — 1 has to be greater than 1-65 to be significant. 
When g = 30, X® lying on the 5 per cent, level on this assump- 
tion is 43-5, whereas Fisher’s tables based on the correct distri- 
bution give 43-773, so the approximation is close enough for large 
samples. 



CHAPTER V 


SMALL SAMPLES 

It is not always possible to obtain large samples, such as are suitable 
for application of the methods of the last chapters, so the theory has 
been developed to give more exact methods suitable for small num- 
bers. We stated in section 3.1 that there is nothing in the definition 
of a random sample that depends on its size, and would reiterate 
this in view of the existence of a fairly common impression to the 
contrary, of which the following quotation is typical. 

‘‘Moreover, in agricultural experiment, the number of observations 
rarely, if ever, is sufficiently large to allow full play to the laws of 
chance.’’ 

This is a misconception, for the laws of chance (as we know them) 
have full play every time a single individual is drawn from the popu- 
lation, and the theory of random sampling may be applied to small 
samples. There are two kinds of modification of the theory for large 
samples. The first consists in developments to the theory of esti- 
mation, which do not alter the general principles described in 
section 3.7. We need not concern ourselves with them any further. 
The second kind of modification is of practical importance and 
consists in using exact sampling distributions instead of the approxi- 
mate ones used with large samples. We shall deal with this more fully. 

Unfortunately, although the limitation of size has been removed 
that of form has not, and most of the following results apply only to 
quantities distributed normally. 

Variance Estimated from Small Samples 

5.1. With a few observations, it is futile to form a frequency distri- 
bution, but the usual frequency constants may be calculated, and 
regarded' as estimates, obtained from the sample, of the constants of 
the infinite population. For the normal population, the mean is found 
in exactly the same way as for large samples, but the best estimate 
of the variance (a^) is obtained by dividing the sum of the squares 
of the deviations from the mean, not by the number of observations, 
but by the number of degrees of freedom. Here the number of degrees 
of freedom is the number of deviations minus the number of con- 
stants determined from the sample and used to fix the points from 
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which those deviations are measured; in the simple case, when the 
mean only is found from the sample, the degrees of freedom are one 
less than the number of observations. We will illustrate this by the 
data of Table 1.3, which are the 100 random observations from an 
artificially constructed population, divided into groups of 5. We may 
find the variance either from the squares of the deviations from 
the grand mean or, regarding each group of five as a small sample, 
from the squares of the deviations from the sample means. For 
the latter process, instead of finding the variance for each sample 
separately, we may sum the squares of all the deviations and divide 
by the total 'degrees of freedom contributed by the ten samples. 
The sum of squared deviations from the grand mean is 8 864-75, 
and on dividing this by 99 we obtain the variance, viz. 89-5. The 
sum of squares from the sample means is 7 056*4, and since each 
sample contributes four degrees of freedom, there are 80 degrees 
altogether and the variance is 88*3. There is quite a fair agreement 
between the two estimates.* If we had used the old method for 
'large samples we should have obtained values 


^ 88-6 

ICO 


and 


7 ° 56-4 

100 


70*6, 


with a much poorer agreement. In a large sample, the difference 
between dividing by N and (A^ — i ) is quite unimportant. 

As a special case, it may be determined that the mean variance for 
a number of pairs is 

zM 

where and x^ are the individuals of any pair, M is the number 
of pairs and S is the summation over all pairs. In this way, the 
variance of any character between brothers from the same parents can 
be obtained as accurately from pairs of brothers from a hundred 
families as from a single family of one hundred and one brothers. 


5 . 11 . The justification for using the new estimate when the devia- 
tions are measured from the mean arises mathematically from the 

* We have not tested the agreement by the ordinary sampling theory, 
using the standard errors of the values, since the values are not independent; 
they have been taken from the same data. 
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sampliog distribution of the variance. If v' is the sample value of 
the variance as defined in section 1.23 and 5 is the corresponding 

standard deviation, 


$ 



and the sampling distribution of v' may easily be derived from that 
given for s in section 2.72; it is 

.V - 3 AV 

df=K^v' 2 e 


where K2 is a constant and N is the size of sample. The mean 
value of v' obtained by integrating this distribution is 



N - I 
N 





Thus, as an estimate of v' tends on the average to give a biassed 
value that is slightly smaller than the population value. It is prefer- 
able in dealing with small samples to use the unbiassed estimate, 

Nv' S(x - xf 
^ iV - I ^ N^i 


Significance of Means: The t T^t 

5 . 2 . When testing the significance of the deviation of a sample mean 
from an assumed population value, we used the fact that the ratio 
of the deviation to its standard error is distributed normally with 
unit standard deviation.* If d is the deviation, cr is the population 
value of the standard deviation and N is the size of sample, this 
ratio is 

d 

I i» 

a 

Vn 

Now, in practice we do not always know a and have to substitute 
the sample estimate, s, WTien dealing with large samples, ^ and a 
are suiEciently alike to make this course reasonably accurate; but 


♦ Tlie ratio is the w of equation (2.5), section 2.51. 
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when the samples are small, some allowance must be made for the 
error involved in using s instead of <j. 

The ratio by which we test the significance of a deviation may 
be written 

d 

t = > 

s 

Vn 


and if we know the sampling distribution of t, we may carry out 
the test exactly, without being limited to large samples. This was 
first pointed out by * ‘Student” (1908) and he gave some tables of 
the sampling distribution of a quantity 2 closely related to t. These 
have now been generally superseded by tables of t given by “Student” 
(1925) and Fisher (1936a). When computing i for use with Fisher^s 
tables, s must be taken as the square root of the variance calculated 
correctly from the sample by dividing the sum of squares of devia- 
tions by the degrees of freedom. Thus, 


d = 


Sx ^ 


-I 





S{x - xf 
N- 


- • • • (s*^) 


where J is the population mean. 

The sampling distribution of t is S3unmetrical and depends only 
on the number of degrees of freedom on which s has been estimated ; 
it approaches the normal form as the degrees of freedom increase. 
Fisher’s tables give, for different degrees of freedom, called values 
of t lying on various levels of significance, the levels being the sum 
of the two “tails” of the distribution as for the third column of our 
Table 2.5. 

For large samples, f = 2*0 lies on the 0-05 level of significance; 
other values that lie on the same level are: 12-7 for samples 

of 2 {« = i), t = 4-3 for those of 3 (tz = 2), i = 2-8 for those of 4 
{n — 3) and i = 2-3 for those of 10 (n = 9); the effect of the non- 
normality of t is serious for samples much smaller than 20. 

Table 5.1 shows the effect of a small electric current on the 
growth of maize seedlings, giving the difference between the elonga- 
tion of the tr^ted and untreated in parallel pairs of boxes (data from 

H 


m- 
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CoIlinSj Flint and McLane, 1929), a positive difference showing that 
the electric^ treatment increased the rate of growth. The mean is 
4-29 mm.; is this significantly different from zero? The sum of 
squares of deviations from the mean is 937*1895 and since on our 

hypothesis | = o, 


t == 4-29 



90 


937-189 


= 1 - 33 . 


and = 9; according to the tables, t = 1*38 lies on the 0*2 level 
of significance, and there is thus no evidence from the sample that 
the treatment has made any difference to growth. The mean elonga- 
tion is not large enough compared with the variations between those 
of the separate boxes to be significant. 


TABLE 5.1 

Elongation in I^Im. (Treated — ^Untreated). 

0*0 

1-3 

10-2 

23-9 

3-1 

6-8 

— 1-5 

— 14-7 

3*3 

ii-i 


Mean 4*29 

Essential Character of t 

Essentially, t is the ratio of a quantity distributed normally 
about a mean of zero, to an estimate based on n degrees of freedom, 
of the standard error of that quantity. Any such ratio, however it 
arises, has the same sampling distribution as 


Significakce of Differences between Means 

5 * 3 » The quantity t may be used for testing the significance of the 
difference tetween two sample means. We shall deal here only with 
the sitoadon in which the assumption is made that the samples 
are from pc^uMcms having a common standard deviation a as well 
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as the same mean.* Then, if and A/g are the numbers in the 
two samples, and and X2 are the sample means, (% — x^ is dis- 
tributed normally about zero and an estimate of its standard devia- 
tion (or error) is 



where s is obtained by summing the squares of the deviations from 
the two sample means, dividing by the total degrees of freedom 
and finding the square root. Thus 




• • ( 5 - 2 ) 


where and *^2 are the summations over the two samples and 
Xi and X2 are individuals in the two samples. Then 



which in large samples is distributed normally with tinit standard 
deviation, is in small samples distributed as Fisher’s t, the degrees 
of freedom being 




Our example is from data provided by Corkill (1930) showing the 
effect of insulin on rabbits ; the results for separate animals are given 
in Table 5.2. The difference in means is not large compared with 
the variations within each sample, and a statistical test of signifi- 
cance is necessary. The sums of squares of the deviations are o • 253 o 
and 0-071 5, and thus 


o ■ 324 K 

= 0-017 08 5 = 0-1307; 

* Since normality is also assumed, the hypothesis and assumptions 
amount to the composite hypothesis that the two populations are one. 
Judging from experience, however, the tests are not very sensitive to 
moderate departures from normality nor to small differences in standard 
deviation. 
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there are ten and eleven rabbits in the two samples, so 


4--^ = 0-4369 

and 

o- i'jS 

t ^ — = 2-41. 

O' 130 7 X 0*430 9 



TABLE 5.2 


Muscle Glycogen (per cent.) 

i Controls 

After Insulin 

r 

i 

t 0-19 

0*15 

1 0*18 

0*13 

j 0*21 

Trace* 

\ 0*30 

0*07 

1 0*66 

0*27 

0*42 

0*24 

o*o8 

0*19 

0* 12 

0*04 

0*30 

o*o8 

0*27 

0*20 


0*12 

Means 0*273 

0-135 


* Assumed to be o • 00. 


The mean variance has been calculated on 19 degrees of freedom, 
and for these conditions, t = 2*09 lies on the 0-05 level of signifi- 
cance; thus the effect of insulin on muscle glycogen appears to 
be 

The difference between this and the earlier example of the effect 
of electrical treatment on growth is that here there is no reason for 
taking controls and treated animals in pairs, they are all independent; 
in the former instance, boxes were treated in parallel pairs, and there 
was some reason for expecting that these would be subject to some 
of the same disturbing factors, which thus would not affect the 
differences. 
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Significance of Differences between Variances 

5 . 4 . When testing the difference between variabilities for large 
samples in section 3.53, we assumed the difference in 

standard deviations of the samples, to be distributed normally with 
a standard error of 



where g is the standard deviation in the population from which the 
samples are presumed to be taken ; and not knowing cr, we substituted 
$1 and ^2 ill expression. Errors arising from this approximation 
are again important in small samples, but Fisher (igz^a and 1936^) 
has suggested a new index: 

• ^ i (log. sf - log, 4) = log, A . . . (5.4) 

where and 4 die two variances calculated on the degrees of 
freedom. For samples of and drawn from the same popu- 
lation, if and Wg numbers of degrees of freedom, z is 

distributed in the form 

where and are the degrees of freedom (= N-^ — i and iVg — i) 
and k is 2L constant.* This distribution, containing as variables 
only Zy n-^ and 712, is independent of the standard deviation of the 
population, a, and since it involves no approximating assumptions, 
is applicable to small samples. The quantity z may vary between 
plus and minus infinity, being negative when is less and posi- 
tive when s-Js^ is greater than unity, and imless == TZg, is skew. 
The positive part of the curve of ;sr = s-Js^^ however, is the same as 
the negative part of and so the probability integrals for 

positive deviations only are sufficient for any combination of degrees 
of freedom, the others can be obtained by interchanging and It 
is simpler, however, not to deal with negative values of z^ but always 
to take the difference of logarithms so that it is positive, and hence to 
choose to be the degrees of freedom on which the larger variance 

* This distribution of z is related to that of s and hence to that of (see 
Fisher, 1 924^2). 
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is measured. Fisher (i 93 &fl) gives values of z at which ordinates cut 
off ‘^tails’’' of 5, I and o-i per cent, of the total area of the curve 
for values of and 7?2 chosen according to the above convention.* 
These points, however, are not the corresponding levels of signifi- 
cance as we understand them. The hypothesis is that the two variances 
are estimates of one and the same population value, and that z 
(which is the difference between the natural logarithms of the two 
standard deviations) is zero. Applying the arguments of section 3.32, 
we regard as lying on the 0-05 level of significance values of z at 
which ordinates cut off tails, each of which is about 0*025 
who|e area; the 5 per cent, level of significance, therefore, lies some- 
where between the 5 and i per cent, points of Fisher's tables, unless 
there is some a priori reason for supposing the variance estimated 
by sf is either equal to or greater than that estimated by 4 (com- 
ime p. 77). 

When and are large, or when they are moderate and nearly 
equal, the distribution of z becomes nearly normal with a standard 
error of 



and in such circumstances a value of z greater than twice this value 
lies above the 5 per cent, level of significance. For example, when 
n^ = n2 = 24, this standard error becomes 0*204, and a ar of 0-408 
lies on the 5 per cent, level (that is between 0-342 5 and 0*489 o, 
at which according to Fisher's tables P = 0*05 and o*oi). In order 
to check the assumption of normality, we may compare the deviation 
at which an ordinate cuts off a tail of 5 per cent, of the whole curve 
(i • 65 times the standard error =1*65 X o • 204 = o • 337) with 
0-342 5; the agreement is quite good. 

For the heights of Englishmen and Scotsmen of Table 3.1, p. 74, 
the standard deviations are 2*548 and 2-480, so -^ = log^ 1-027 
= 0*026 6, and using the above approximation its standard error is 



I 

194 



13^4. 


0*021 5; 


* Mahalanobis (1932) gives similar tables for the 5 and i per cent, points 
of Si/% and sj/sj, and by tising these, the rather troublesome transformaticm 
of equatkm (5.4) may be avoided. Tables Dx and Da at the end of ffiis 
i»ok give wiTespcrndkig values of log^o 
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z is only 1-23 times its standard error and is not significant. The 
other test in section 3.53 gave the difference in standard deviations 
as I ' 27 times its standard error, and this shows that in very large 
samples it is as well to use the difference between standard devia- 
tions as z. 

As an illustration of a small sample, we will test the difference in 
variability of muscle glycogen between the controls and insulin- 
treated rabbits of Table 5.2. The sums of squares are 0-253 ^ 
0-071 5, the variances are 0-028 ii and 0-007 15, z — l-log^ 3*93 
== 0-684, % = 9 and ^2 = 10. ^ 

Fisher’s tables do not give z for these values of and TZg, but 
interpolation is made easy by the fact that for any one value of 
changes in z are nearly proportional to Linear interpolation 
with respect to i/n-^ is therefore appropriate. We shall write z^ and 
% for values on the 5 and i per cent, points. 

Then, for Wg = 10, 


when Wj = 8, 

— = 0-12500, 2’5 = o*56ii, and — 0-8104; 

^1 

when Ttj = 12, 

— = 0-08333, ^5 = 0’534 6, and ^1 = 0-7744; 

Wl 

m 

and the differences are : 

= 0-041 67, diff ,{ z ^) = 0-026 5, and dijf ,{ z ^ = 0*036 o. 


Hence, when = 9, 


I 

— = O-III II, 


arg = 0-561 I — 


and z-^ — 0-810 4 — 


0-013 ^9 
0-041 67 

0-013 ^9 
0-041 67 


X 0-0265 = 0*552 3. 

X 0-036 o = 0-798 4. 


The value z = 0-684 between the 5 and i per cent, points, but 
it is doubtful if it is on the 5'per cent, level of significance, and we 
can only say that although the data suggest that the effect of insulin 
has been to make the muscle glycogen percentages more regular, 
further observations are necessary to establish the fact. 
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Interpolation with respect to is scarcely ever necessary, since 
the tables are given for % increasing by units between i and 30. 


Significance of Variations between Several Samples 

5 . 5 . The method of section 5.3 tests the hypothesis that two samples 
are from populations having the same mean, assuming they have 
the same standard deviation. AVhen there are more than two 
samples, the h}^othesis may best be tested by the analysis of 
variance described in the nest chapter. 

It is sometimes useful to test the significance of the general 
differences betw’een the variances of more than two samples. For 
instance, it may be desired to combine data from several experi- 
ments or sources, and before doing so, it is advisable to ascertain 
that the variance within each set of data is sensibly the same; or 
again, when examining the products of a factory statistically, any 
difference in the variability of the articles produced on different 
machines, sections, or times is taken as evidence of lack of control. 

Ne5nnan and E. S. Pearson (1931) have proposed an index for 
testing this. If there are k estimates of variance, each 

based on N observations (n = AT *— i are the degrees of freedom), 
the index is* 

1 

^ _ (gigg . ■ ■ 

^ I 

-r + . . . 



Thus, Li is the ratio of the geometric to the arithmetic mean of the 
variances. Mahalanobis (1933) and Nayer (1936) have tabled values 
of Li lying on the 5 and i per cent, levels of significance for various 
values of k and iV.f Nayer’s tables are fuller than the others. One 
property of L^ is that it decreases in value as the variances differ 
more among themselves j it has a maximum value of unity when 
the variances are equal. 

In a series of six experiments (Tippett, 1934), the following six 
variances of cotton yam breakages were obtained: 0-2056, 0-5578, 
0-6489, 0-3378, 0-6194 and 0-4311; each was based on 9 degrees 

• Our notatiim differs from that of Neyman and Pearson. 

t Mahalanobis and Nayer write n for the number of observations in 
sample, corresponding to our N. In applying this test, we shall consistently 
use the small fetter for degrees of freedom. Nayer writes / for our n. 
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of freedom. Before combining the results of these experiments, it 
was considered necessary to test the variances for homogeneity. 
Here, A == 6 , iV==io and = 0*930. From Nayer’s tables, 
=z o*8i lies on the 5 per ceat. level, and remembering that a 
high Li corresponds to greater uniformity of variance, we conclude 
that the value of 0-93 is not significant. 

Small Samples from the Binomial and Poisson Distributions 

5 * 6 . In Chapter II we used the binomial and Poisson series to test 
the homogeneity of a large number of counts of seeds and yeast 
cells ; but it is not always possible to obtain such large samples from 
one population. However, if there are small samples of replicate 
counts from a variety of populations, the experience so obtained can 
be reduced to a common measure and added. 

5 . 61 . We will develop the tests for the binomial by reference to 
Table 4.3. The numbers of round and angular peas and the totals 
may be regarded as forming a 2 X 10 contingency table, and obtain- 
ing the expectations from the totals (i.e. ignoring the Mendelian 
expectation), it may be tested for randomness by means of the 
calculated on 9 degrees of freedom as shown in section 4.4. If 
this value of x^ is significant, the data are not random nor homo- 
geneous. If there are several such tables obtained (say) from a 
number of varieties of peas in which the expectations are not neces- 
sarily equal, all the values of ^ and their degrees of freedom may be 
added, and the sum tested in the usual way. Further, there may 
be enough tables with the same number of degrees of freedom (say 
a hundred or more) to form a frequency distribution of and this 
may be compared with the theoretical form obtained from the 
standard tables. The constant is thus the common measure to 
which a variety of experiences may be reduced. 

The expression for becomes particularly simple if the number 
of peas from each plant is constant. Using more general language, if 
there are g' parallel sets, each of n trials, with . successes 

in the sets and a mean of m successes,* then 



* In Table 4.3 the g' sets are the 10 plants, the n trials are the total peas 
on each plant (we are now dealing with the case where n is constant), 
and the . . . successes are the numbers of round peas. 
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and there are (g' — i) degrees of freedom. This, like the obtained 
in section 4.3 under a slightly different assumption, is a measure of 
the variation in between plants. If we had a number of such 
sets of 10 for different kinds of plants, we could find the x^ for 
each set, and compare their distribution with the theoretical one 
for 9 degrees of freedom. In the same way, for instance, the uni- 
formity of conditions in seed germination can be investigated from 
a large experience of many sorts of seed, as long as there is the 
same number of replicates (g') on each seed. 

In one series of experiments made to find the effect of various 
treatments on cotton seeds, 100 seeds were exposed in two lots of 
50 for each treatment. In such a cmse, if nii and wzg the numbers 
germinating in the two lots of 50, 



(tWi — 


K + I - 




100 


and there is one degree of freedom (g' = 2). There were 52 treat- 
ments, giving a total x^ of 82 • 84. The tables do not extend to so many 
degrees of freedom, so we may use the approximation mentioned in 
section 4.5, assximing 

V 2x^ — Vzg — 1 


to be normally distributed with unit standard deviation. Here g = 52, 
and 

V2^— V2g~ I = 2-72, 

corresponding to P = 0-003 ; suggests that there must have been 
some lack of uniformity in conditions. The frequency distribution 
of ^ and the theoretical form as found from Fisher’s table for one 
degree of freedom are compared in Table 5.3; as Fisher gives 
(®ly a few probability integrals, the intervals of are unequal. 
There are obviously too many pairs of counts with large values of 
X®, and to test the divergences we must use the rather overworked 
co nst a nt , again; but as shown in section 4-^r calculating a x'^ 
acocwdmg to equation (4*1) as a measure of the differences between 
the two distributions of Table 5.3.» In doing this, we have com- 

Tfeis {lonhle use of x* may be a little confusing, but if readers for 
the momiQit forget that the vanate of Table 5*3 is and just regard the 
tabfc as two distributions, the differences between which are to be tested 
by cakuloting x**, all will be welL 
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bmed the groups in pairs according to the brackets, and obtain 
= 7*86, which for three degrees of freedom (i.e. four combined 
groups and only the total of the theoretical distribution has been 
adjusted) lies almost exactly on the 0*05 level of significance. This 
test of randomness is less stringent than the one above adding all 
the values for for it groups all those over 3*706 together, and 
takes no account of how much greater they may be, ' 

TABLE 5.3 


Frequencies 


X® 

Eaqjected 

Observed 

0— 0*015 8 

5-2 

si 


0*015 8-0*064 ^ 


I 

• 

0*064 2-0*148 

5*2 

3 


, 0*148 -0*455 

10*4 

10 

< 


0*455 -1*074 

iO '4 

10 

>■ 

1 I * 074 —I * 642 

5'2 



I • 642 -2 • 706 

5*2 

8 

1 

over 2*706 

5*2 

lOj 

1 

Total 

52-0 

52 



5 . 62 . For the Poisson distribution, the index corresponding to that 
of equation (5.6) is 

5 ( 772 , ~ 772)2 
^ 772 

where 772, is any individual coxmt, m is the mean, and S is the sum- 
mation over all counts. Fisher, Thornton and Mackenzie (1923) 
first used this to test the accuracy of bacterial counts and found 
significant deviations from expectation. More recently, Smith and 
Prentice (1929) have used it to check their technique of cyst counts 
in soil. They had 73 sets of 10 coxmts from different samples of soil, 
and calculated the 73 values of The distribution of this is com- 
pared with the theoretical one for g degrees of freedom below in 
Table 5.4, and the expectations, being obtained from Fisher’s book 
(1936a), are for unequal intervals of 
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TABLE 5-4 


1 

Frequencies 


i 

Obseived [n^ i 
i 

0- 4*i68 

7-3 

1 

7 

4-168- 5-380 

7'3 

7 

5-380- 6-393 

7-3 

6 

6-393- 8*343 

14-6 

32 

8 *343-10 *656 

14*6 

18 > 

10*656-12 *242 

7*3 

2 : 

12 *242-14 '684 

7*3 

5 

over 14*684 

7-3 

6 

Total . . 

73-0 

73 


The agreement is quite fair; it is tested in the usual manner 
calculating another 



9-604, 


which for seven degrees of freedom gives P greater than o-a. 






CHAPTER VI 


THE ANALYSIS OF VARIANCE 

Previous chapters are based on the presentation of collections of 
data in the form of frequency distributions and on the description 
of a few features, particularly the extent and form of the variation, 
by frequency constants. This is a process of summarisation in which 
some detail is inevitably lost. We shall now reverse the process and 
deal with statistical methods designed to recover some of the detail 
that is of value, starting from the frequency distribution and its 
constants as bases. Comparatively little has been done in this 
direction with the form of variation, but the amount of variation 
can be split up into parts associated with different causes or sources. 
The method of analysis of variation is introduced in its simple 
quantitative form in this chapter; and most of the subjects dealt 
with in subsequent chapters are developed from the same basic 
method. As a first step, we shall in the next section prove a funda- 
mental mathematical property of variance. 

Variance an Additive Quantity 

6.1. Variability may be considered to arise from a multitude of 
causes producing small deviations ; but it often happens that to these 
there are added larger deviations due to a few more important 
causes, and the variation is not always homogeneous. It is a funda- 
mental property of that measure of the degree of variability, the 
variance, that it is additive, i.e. if a quantity is subject to the operation 
of several independent causes each of which contributes a certain 
variance, then the final variance of the quantity is the sum of those 
due to the several causes. Let it be assumed, for example, that a 
quantity x is subject to random variations, and to others associated 
with two factors A and then the value of any one observation 
of X is 

X Cf. ^ 

where ^ is the mean, a and jS are the deviations arising from A and B, 
and is the random deviation. The square of the deviation of x 
from its mean is 

{x - 1)2 = a 2 + ^2 ^ 
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and tills may l>c summed for a sample of iV individualSj and divided 
by the degrees of freedom (N in this case, since we have not found 
the mean ^ from the sample, but have assumed it). Thus we obtain 

S(x-Jf SoL^ , , 25 a^ zScti' zSjii' 

N IV ' N ^ N ' N N N 

and as N becomes indefinitely large, the last three terms of this 
equation tend to zero if oc, j8 and are independent ; the other terms 
are the squares of the standard deviations or variances, so that 

finally 

2 2 q_ 2 _j_ 2 ^ ^ ^ (6.l) 

Hence the variance of x is the sum of the random variance and of 
thc^e due to A and B, We used this property in Chapter III to 
find the standard error of a difference in terms of those of the two 
means; since the variance is the square of the standard error, the 
above equation leads directly to equation (3.1).* We shall use it now 
to analyse the variability of quantities into parts. 

Analysis of Variance 

6 . 2 . Consider Table 6.1 in which the data (Harris, 1910) are 
frequency distributions of ovaries containing different numbers of 
ovules, and are for ten separate shrubs of the American Bladder 
Nut. The separate columns are called arrays. It is obvious that there 
are considerable differences between the shrubs, for while shrub ii 
has ovaries with 22-30 ovules, shrub 13 has ovaries with between 
17 and 24 ovules, and the other shrubs show similar differences. The 
variations between ovaries on any one shrub are less than those 
between all taken together, and Ae whole table suggests that we 
may legitimately divide the total variability into two parts: one 
associated with differences between ovaries from the same shrub, and 
another with differences between shrubs. We say that there is an 
association between the shrubs and ovules per ovary, and as evidence 
adduce the fact that the mean variance of deviations from the shrub 
means as found by the method of section 5,1, which is 3-057, is very 
much less than the variance of the “totals” column (5 • 385). 

Indeed, Table 6.i is r^y a manifold contingmqr table, but the 
association is expressed differently because one variate iS quantitative ; 

* a and jS may be positive or negative. 
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a treatment by the method of the contingency table would be very 
laborious with so many squares. 

We may present the analysis of the variability into the two parts 
in a systematic manner. Let x be any individual reading, ^ , 

, x„ the m shrub means, x the grand mean, and n the number 
of readings per shrub, so that there 3ienm = N readings altogether. 

TABLE 6.1 

Frequencies of Ovaries (Series O, 1908 C). 

t Serial Number of Shrub 

'I , . — . - - Totals 

■ ■ ' 

§ ,11 12 13 _ 15 ' 16 ■ 18 19 20 21 22 

I 


i 


1 

[ 

17 

f 

— 

I 


18 

— 

2 

23 

— 

s 

19 

—— 

.5 

13 

> 

\ 

20 

— 

8 

27 

— 

} 

>» 

21 

— 

10 

4 

I 

cl 

• > 

22 

^ 4 

21 

18 

— 

{ 0 

23 

5 

17 

7 

^ 3 

u „ 









! 



eft 

24 

41 

i 35 

7 

16 

eu - 





3 

25 





> 

21 

2 

— 

25 

0 

26 

9 

— 

— 

21 


27 

9 

— 

— 

8 


28 

4 

— 

— 

8 


29 

3 

— 

— 

12 


30 

4 

— 

— 

5 


31 

— 

— 

— 

I 


1 


I 

6 

I 

— 

— 

I 

34 

I 

4 

5 

5 

— 

1 

34 

I 

4 

I 

5 

2 

2 

50 

I 

5 

2 

7 

1 

I 

32 

7 

23 

9 

12 

6 

2 

92 

9 

16 

IS 

25 

8 

15 

120 

42 

48 

44 

39 

6r 

68 

401 

14 

3 

18 

6 

10 

8 

107 

9 

I 

4 

1 

4 

2 

51 

5 

— 

I 

— 

3 

— 

26 

3 

— “ 

— 

— 

2 

— 

17 

3 

— 

— 

— 

2 

— 

20 

4 

— 

— 

— 

1 

— 

14 

— 

— 

— 

— 

— 


I 


Totals . . 100 roo 100 100 100 100 100 100 100 100 i 


000 


Then, for any ovary on any one shrub, the deviation from the grand 
mean is the deviation from the shrub mean plus that of the shrub 
mean from the grand mean: 

(x — ^ = {x— X,) + (x, — x). 

We will square these deviations, and sum for all observations from 
the one shrub ; denoting this summation by S' and applying the 
rules in section 1.3 ii we obtain 

S'{x - xf = S'(x - + n{x, - xf + 2{x, - x)S’{x - x,). 
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The term (x^ ~~ xf is constant for the one shmb and its sum for 
the n observations is n times the single value (the second term in 
the above equation); the third term above is zero, since the sum 
of the deviations from the mean, S'{x — xj, is necessarily zero. To 
obtain the total sum of squares, we now sum the terms of this 
equation again for all shrubs (using the sign 5 ) and finally obtain 

SS'{x - xf = SS\x - x,f + nS{x, - xf . . . (6.a) 

s s s 

The sign SS^ is equivalent to 5 , the simple summation over all 

S 

observations in the sample. For the data of Table 6.1, equation 

(6.z) is 

5 379 - 77 S = 3 026-350 + 2 353-425, 

and each of these terms is given an appropriate place in Table 6.2 
of the Analysis of Variance. The terms are the sums of squares of 
deviations, (a) of individual observations from the grand mean, 
(b) of individual observations frcm the shrub means, and (c) of the 
shrub means from the grand mean (the sum being multiplied by 
n in this case). The degrees of freedom are given in the table; for 
the total, one mean has been found and there are thus 999 degrees; 
for the deviations from the shrub means, each shrub contributes 
99 degrees, giving a total of 990; and for the shrub means, 10 devia- 
tions are measured from the grand mean, leaving 9 degrees of free- 
dom. The sums of squares and degrees of freedom of the parts should 
add up to give the “total,’’ and the sums of squares, divided by the 
degrees of freedom, give estimates of the variances. That for the 
“total” is the ordinary variance of all the observations in the sample, 
and that for “within a shrub” is the variance of the intra-shmb 
deviations found after the method of section 5.1, and is an estimate, 
based on 990 degrees of freedom, of the true variance, (say). 
The variance “between shrubs” is n times the square of the standid 
deviation of shrub means. To investigate this more fully, let us 
suppc^e we have an indefinitely large number of ovaries from each 
of an indefimtely large number of shrubs, and let the variance of 
these shrub means be a/; this is the true value for the infinite popu- 
lation. If now we have only n ovaries from each shrub, the means are 
subject to random errors due to the intra-shrub variation, and their 

standard error is These errors increase the variability of the 
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simib means, and the resulting variance may be obtained by adding 
the two components, as shown in equation (6.1), giving 



the variance of Table 6.2 is an estimate based on 9 degrees of 
freedom of n times this. If is the variance between shrubs, and 
Vy the variance within a shrub, as faimd from the sample 

2# 


V, 


V. 


a. 


(6.3) 


These two relations are the basis of the expression of association. If 
the variation between shrubs is relatively important, is large 


TABLE 6.2 
Analysis of Variance 


Source of Variatioit 


Sum of Squares 


Degrees of 
Freedom 


Variance 


Between shmbs .. 2353*425 9 261*492 

Within a shrub .. 3026*350 990 3*057 


Total .. .. 5 379-775 999 5-3^5 


compared with and the two estimates and are very different. 
If the shrub variation is zero, ^^^=0 and tends to equal In 
such a case, for normally distributed variates, and are two inde- 
pendent estimates of the same variance, They are subject to the 
same random errors as are all estimates of variance, and the sig- 
nificance of any difference should be tested by the methods of 
section 5.4. For Table 6.2, the one (261-492) is so much greater 
than the other (3*057), that the reality of the association needs no 
test. Using the relations of equation (6.3) 

261 -492 100 2+ u, 2, 

3-057 

whence 2 • 584 

* The sign denotes that the quantity on the left is an estimate of that 
on the right, and that the former approaches the latter as the size of the 
sample (number of degrees of freedom in both parts) increases indefinitely- 

I 
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If a sample of ovaries were taken at random from these shrubs, the 
number from each shrub being left to chance, the variance would 
be a^^-f of which an estimate is 5 •641. This differs slightly 
from the variance in the ‘‘total’’ row of Table 6.2 because the 
sample to which the latter refers is not entirely random ; it has been 
arranged so that 100 ovaries come from each shrub. 

Table 6,3 sumniar! 5 £ s the above relations and sets out the analysis 
of variance for the general case of n observations on each of m arrays. 
The variance within an array is often called the residual; having 


TABLE 6.3 
Analysis of Variance 


Srorce of Variation 

Sam of Squares 

Degrees of Freedom Variance 

Between arrays 

nS(xs — - xY 
s 

m — 1 no- ^ 4 - 0 ^ 

s s ‘ r 

^ Within an array 

SS'(x - XsY 

s 

m(n — 1) = N — m 

Total 

Bi 

mn — I = iV — I — 


performed the analysis, we are not very interested in the variance 
of the “total” row, since the two parts contain all the useful informa- 
tion. 

Computation 

6 . 3 . When computing the sums of squares for Table 6.3, the various 
deviations may be found explicitly, and squared and added. If the 
arithmetic is correct, the sums of squares between and within arrays 
should add up to give the total; this provides a check. Usually, 
however, this process is laborious and it is convenient to apply a 
mcKiification of equation (1.2), p. 37. If the original units are used 
so that A = I , the appropriate part of equation (i .2) may be written 

rp2 

S{x - xf == _ :L, . . (6^) 

JLi 

where S, x and N have their usual meanings and T is the total 
value dF the variate tor the N d>servations, i.e. T= Sx = Nx. 
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Applying this to equation (6.2) and writing T — SS'x and 
= S'x for the sth shrub, we have 

SS'(x - xf = SS'x^ - 5, 

SS'ix - x,f = SS^x^ - ^ . (6.5) 

ST,^ y2 

and nS(x. — xY = -:r7* 

^ n N . 

The process of computation may be written in words : 

(1) Square the individual values and add, 

(2) Find the total value of the variate for the arrays, square, add, 

and divide the result by the number of individuals per array, 
and 

(3) Find the total value of the variate for all individuals, square, 

and divide by the grand total number of individuals. 

Then if (i), (2) and (3) represent the results of the above operations, 
equation (6.2) becomes 

[(I) - (3)] = [(I) - (2)] + m - (3)]. 

When the data are grouped, readers should have no difficulty in 
applying equation (1.3), p. 38. In such instances, it is better not 
to use Sheppard’s corrections, but to keep the grouping fairly fine 
so that they are unimportant. The precise effect of these corrections 
is not certain ; but they increase the apparent association by reducing 
the residual variance, so to neglect them in testing significances is 
to be on the safe side. 

It may be convenient to measure the values of the variate as 
deviations from some arbitrary origin and to divide them by an 
arbitrary constant h before summing and squaring. Then after apply- 
ing equations (6.5) it is only necessary to multiply the resultant 
sums of squares by The process of computation based on equa- 
tions (6.5) may easily be performed exactly, with the aid of a table 
of squares, down to the last stages of dividing by the numbers of 

* These values of T are not the total numbers of individuals but are the 
total amounts of variate in the several arrays. In Table 6.1, Ts is the total 
number of ovules in the 100 ovaries from the rth shrub. 
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individuals, and since there are only two such divisions, these may 
be performed with ample accuracy without much labour. 

As an example, we have analysed the variance of the loo readings 
of Table 1.3, p. 32, into two portions, between groups of 5 and 
within a group. ‘^The results are in Table 6.4. Since the groups are 
random samples cTj ^ = o, and the two estimates of variance should 
be nearly equal, as indeed they are; actually the variance within a 
group is less than that between groups, and we will test the 
significance of this diiference. In the notation of section 5.4, 

TABLE 6.4 


Analysis of Variance (Data Table 1.3) 



Stnn of Squares 

Degrees of 
Freedom 


Variance 

1 

s 

^ Between groups 

I 808*35 

19 


95*18 

Within a group 

.. 7056*40 

80 


88*20 

Total .. 

. . 8 864*75 

99 


— 

z = |[log,95-i8 — 

log^ 88 • 20] = 0*038 

I, Ml =: 

19 

and ^2 = 80. 


From Fisher’s table we see that when = 24 and = infinity, 
a S’ of o-2o8 5 lies on the 0-05 point, so our z of 0*038 i is quite 
insignificant, and the two estimates of variance are equivalent. 


Tables with Non-uniform Array Totals 

6 . 4 . The analysis of variance may be performed on tables in which 
the numbers of individuals in the arrays are not equal. If the numbers 
in the arrays are 72^ Wg . . . equation ( 6 .z) becomes 

SS\x - xf = SS'{x - x,Y + Sn^x, - xf . ( 6 . 6 ) 

and S{T,yn,) is written for ST^^n in equations (6.5). 

In such instances, the relations of equations (6.3) do not hold, for 
in su mm i n g the squares of the deviations of the group means from 
the grand mean, each group has been given a different weight, m,. 
If the true variance between groups (o-j is zero, however, ©„ 
and the test for the existence of association is that and ©,, are 
significantly different. 
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TABLE 6.5 






Nest Type 



Totals 











Meadow 

Pipit 

Tree 

Pipit 

Hedge 

Sparrow 

Robin 

Pied 

Wagtail 

Wren 


1 

19-65 

1 



_ 



I 

t 

19*85 

— 

— 

— 

— 

— 

I 

I 

1 

\ 

20*05 ' 

I 

— 

— 

— 

— 

I 

2 

20*25 

— 

— 

— 

— 

— 

I 

I 


20*45 

T 

— 

— 

— 

— 

— 

— 

CO 

i 0 

20*65 

I 

— 

— 

— 

— 

— 

I 

IS 

20*85 

I 

— 

I 

— 

— 

3 

5 

> 

21*05 

— 

I 

— 

I 

I 

3 

6 

a 

u 

21 *25 



— 

— 

— 

— 

I 

I 

a 

d) . 

31*45 

— 

— 

— 

— 

— 

I 

I 

0 

•w* 

21*65 

3 

— 

I 

— 

— 

— 

4 

• 

g 

21*85 

3 

I 

— 

I 

3 

— 

8 

' c 









C 









d 

•Fi 

22*05 

10 

I 

I 

3 

I 

3 

^9 

GC 

bo 


8 




• 



bo 

W 

22*25 


— 

I 

— ■ 

I 

10 

22*45 

3 

I 

— 

2 

I 

— 

7 

00 

0 

0 

22*65 

2 

I 

— 

I 

I 

— 

5 

^ ' 
0 

22*85 

4 

— 

I 

— 

— 

— 

5 

3 

0 

23-05 

I 

— 

4 

5 

2 

— 

12 


23*25 ^ 

2 

3 

— 

I 

I 

— 

7 

0 

r! 

23 ‘45 

I 

2 

I 

— 

I 

— 

5 

-M 

GO 

23-65 

I 

I 

— 

— 

— 

— 

2 

d 

OJ 

23-85 

I 

I 

3 

1 

— 

— 

6 

1-4 

24*05 

. — 

3 

I 

— 

3 

— 

7 


24*25 

I 

— 

— 

— 

— 

— 

I 


24*45 

I 

— 

— 

— 

— 

— 

I 


24*65 

— 

— 

— 

— 

— 

— 

— 


24*85 

— 


— 

— 

I 

— 

I 


25*05 

■ 

■ * 

I 

— 

__ 

— 

I 

Totals Ws 

45 

15 

14 

16 

15 

15 

120 


« * 

56 

78 

75 

42 

64 

— 69 

246 



69*69 

405*60 







• • 

401 *78 

110*25 

273 *07 

317-40 

504*30 
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Table 6.5 gives distributions of the lengths of cuckoos’ eggs 
found in the nests of a variety of other birds (Latter, 1902). The total 
frequencies for the kinds of nest vary, and since the observations are 
fewer, the association is not quite so obvious as in the previous 
example. The grouping is rather fine for such a small sample, but 
has been adopted because Sheppard’s corrections will not be applied. 
The values have been measured as deviations from the length 
2^*05 mm. in terms of the unit of grouping 0*2 mm. The quantities 
and are given in these units at the foot of the table, and 

we may calculate that 


jn 2 2 ^ 

S5V 3 534^ 5^ = 1 S77'79 ^ = 504'30- 

X V 




TABLE 6.6 


Analysis of Vamance (Lengths of Cuckoos* Eggs) 


Sonroe of Vaiiatioa 

Sum of Sqtiares 

Degrees of 
Freedom 

Variance 

Between nest types 

1073*49 

5 

214*7 

Witixin a nest type 

2356*21 

1 14 

20-7 

Total 

3429*70 

119 

— 


From these, the terms of equation (6.6) are found and inserted in 
Table 6.6. The variance between nest types is greater than the 
residual, and we will test it for significance. We find 

= I [log, 214*7 — log,20*7] = 1-17, 7 Zi = 5 and ^2=114* 

For % = 5 and 60^ z = 0*779 ^ lies on the o*i per cent, 
point, so we must conclude that the association between egg-length 
and type of nest is real. These variances are in terms of the arbitrary 
units (0*2 mm.), and if they are required in imn.^ they must be 

multiplied by 0*04. 


Relation between z and t Tmrs 

6 . 5 . A special case of analysis occurs when there are two groups; 

tins is the comparison between two ^mples. If there are n!. and n' 

12 
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observations in each group, and X2 are the means, and x is the 
grand mean; the analysis of variance is as set out in Table 6.7. 

Si and S2 are the summations over samples i and 2, and S is the 
summation over the whole ; also, if is greater than 

z = i [log, W, — log, = log, 

and the degrees of freedom are ^23^=1 and ng = + 223 

The square root of the ratio of variances is the t of equation (5.3)> 
p. 115, which was used in testing the difference between two means. 
Thus, if we make the transformation 



S' = I log^ or t = 
TABLE 6.7 



Analysis of Variance (Two Groups) 


f 

\ Source of Vaiiatioii 

Snm of Squares 

Degrees of Freedom 

Variance 

^ Betwreen samples 

► 

f 

d” ^2(^2 ~ 

(^3 — 3^2)^ 

I 


Within a sample 

Si(x — Xj)^ -1- — ^^ 2 )^ 

t , t 

W ^2 — 2 

Vr 

Total 


W3 + TZg I 

— 


the distributions of t for n degrees of freedom, say, and of z for 
% = I and 712 = n equivalent. This is because z and t are 
mathematically related ; for any value of t there is only one value of z, 
A slight modification in interpretation of the z test is necessary 
before we can show its equivalence to that based on For the simple 
use of -S' to test the difference between two variances, we showed in * 
section 5.4 that the 0-05 level of significance is about the 2*5 per 
cent, point; on the other hand, when used as an alternative to ^, the 
0-05 point of z is also the 0*05 level. When testing two variances, 
the hypothesis is that in the population z = o, and a positive or 
negative deviation, if sufficiently large, may be incompatible with this. 



is 0*05 whole curve is the 0*05 level of significance (since the 
whole of the curve is equivalent to the positive half of that for ^). 
The difference bemeen the two treatments of z lies in the type of 
question asked. In .the former use we ask if z is significantly different 
from zero, and a large negative value may be so; in the latter use 
for comparing two means we ask if is significantly greater than 
minus infinity (i.e. if f is greater than zero), and no negative value 
can be so. A large negative value of z is quite compatible with the 
hypothesis that Ae means of the two samples are really equal, for it 
only show^s that t is smaller than would be expected, and, that the 
samples are a little too much alike. These varying distinctions 
between 0*05 points and levels of significance are very puzzling, but 
they should be clearly understood. 

We find from the tables that vrhen n = 24, t — 2*064 lies on the 
0*05 level, "whence z — log^ 2-064 = 0*725 for = i and = ^4 
should also lie there. Fisher’s tables of z give the value 0*724 6 as 


the o • 05 point. 


Significances of Groups of Means 

This process of analysis of variance is the better method of 
testing the differences between several means as a whole, referred 
to in section 5.5, and is eminently suited to small samples; for the 
distributions of z which are used are exact, provided the form of 
the residual deviations is normal and their variance is constant for 
all means. When Fisher’s tables of z are used to test the significance 
of differences between several means, the 0*05 point is also the 
ame level of significance, as we have just shown for pairs of means. 

* The fact that we interchange the degrees of freedom, choosing 71^ to be 
arnwriated with the larger variance and keeping z positive, is purely a matter 
of computing convenience, and avoids the necessity of having duplicate 
tabfe wiA negative values of z. The complete distribution of z contains 

negative deviatic®s from zero. 
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Table 6.8 is taken from a paper by Warren (1909), and shows the 
mean head breadths of numbers of termites (small soldiers) taken 
from five nests during five months.* 

We will first analyse the variance into two parts, between nests 
and within a nest. The actual deviations from the grand and nest 
means have been squared and summed and the sums are entered 
in Table 6.9. The between-nest variance is much greater than 
its residual, giving a jsr of 0*829, "^hich lies beyond the o*oi point 
(z= 0*7443 lies on it); so the variation from nest to nest is real. 
This merely confirms 'what would be expected from an examination 
of Table 16 . 8 , for nests 670 and 675 have the highest means in all 

TABLE 6.8 


Mean Head Breadths of Termite (Small Soldiers) in Mm. 


Nest Number . . 

66S 

i 

670 

672 

674 

675 

f 

Means 

November . . 

2*273 

2-479 

2*404 

2-447 

2*456 

2*411 8 

January 

2*333 

2*603 

2*457 

2*3^8 

2*626 

2*481 2 

March 

3 '375 

2*613 

2*452 

2-515 

2*633 

2*517 6 

May 

2-373 

21*557 

2*396 

2*445 

2*487 

2*451 6 

August 

2*318 

2*377 

2*279 

2*312 

2*410 

2*339 2 

Means . . 

2*334 2 

2*525 8 

2*397 6 

2*421 4 

2*5224 

2*4403 


months, 672 and 674 nearly always come next, and 668 has the 
lowest mean in all but the month of August. 

The reality of the differences between months is not quite so clear, 
for although August has the lowest mean in all nests but one, and 
March the highest in all but another, the other months show no 
consistent differences. We can, however, test this in the same way 
by analysing the total variance into two parts, as in the second part 
of Table 6.g. The ‘‘between months” variance is certainly greater 
than the residual, but 

* This sort of table must not be confused with one like Table 6.1. Here 
the entries are measurements, there they are frequencies ; here the characters 
are three in number (head breadth, nest number, and month of year), there 
they are only two (ovules per ovary and shrub number). ^ 
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and being barely on the 0*05 point (z = 0*526 5 is just on it) is of 
doubtful significance; hence on the simple analysisj the seasonal 
variation 5 though suggestive, is not well established. The situation 
is complicated, however, by the nest variation, and this treatment 
is not quite sufficient; more complete methods will be dealt with in 
section 10.3. 

6 . 7 . We have shown that the method of the analysis of variance 
well expresses the heterogeneity of variability. This heterogeneity 
is obvious when the effect is great, but the analytical method is 


TABLE 6.9 

Analysis of Variance (Head Breadths of Termites) 


Soorce of Vaxiatioii 

1 

Sum of Squares 

D^ees of 
Freedom 

Variance 

; Between nests . . 

0-137442 

4 

0*034 36 

1 Within a nest . . 

» 

} 

.. 0*130967 

20 

o*oo6 55 

1 

Between months 

.. 0*094046 

4 

0*023 SI 

J Within a month 

I 

0-174363 

20 

o*oo8 72 

f Total 

J 

.. 0*268409 

24 

— 


objective, and so is particularly useful when the effect is smaller 
and not so obvious. Like all quantitative methods, statistical or 
otherwise, it merely specifies objective measures for qualities which 
may be subjectively appreciated in extreme cases. So far, however, 
we have not been concerned with more than testing for the existence 
of association, and have paid no attention to the problem of expressing 
its strength. As in all such tests, the probability of significance, 
depending as it does on the size of the sample, gives no indication 
of that strength. 

Although the methods and tests described in this chapter are only 
applicable when the distribution within, each array is normal, work 
by E. S. Pearson and others, published in Biometrika, indicates that 
iiMxlerate skewness, such as is apparent in Table 6.1, is not likely 
to lead to serious error. 
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It is fiirtlier assumed that the variance within the separate arrays 
is constant within the limits of the errors of random sampling. 
The residual variance is a mean variance for all arrays. Where non- 
uniformity is suspected, its significance may be tested by the method 
of section 5.5. 
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CORRELATION 

Correlation Tables and Scatter Diagrams 
7 . 1 . When dividing the total variability of a quantity into parts, 
one of which is associated with arrays as in Table 6.i, it is by no 
means inevitable that the character which defines the arrays should 
be some qualitative description or serial number; it also may be 
the groups of some quantitative variate. Tables 7.1 to 7.5 are 
examples in which the individuals are classified according to two 
quantitative variates; these are called correlation tables. In the first 
of these, the arrays are groups of eggs having a small sub-range of 
length; each array is regrouped according to longitudinal girth. The 
association between the tw’o quantities is well marked, and it is clear 
that there is also a tendency for the length to increase with the girth 
quite regularly. We have thus reached another stage in the treatment 
of variability ; for wrhile the single distribution shows the extent and 
form of the variation and the table of arrays introduced in the last 
chapter shows the association of parts of the variation with other 
factors, now the correlation table discovers the nature of that asso- 
ciation when the other factor is a quantitative variate. It will be 
noticed, however, that such a table may be approached in two ways; 
either variate may be regarded as the one which is being analysed, 
and both the columns and rows are arrays. 

When arranging correlation tables, it is convenient to choose the 
grouping so that there are from ten to twenty for each variatfe ; if any 
observation appears to fall exactly on the dividing line between two 
groups, a half may be given to each, and similarly it may happen 
that a quarter of a unit may be assigned to each of four adjacent 
cells in the table. In specifying the characters, either the sub-ranges 
of the groups may be given, as in Tables 7.1 and 7.4, or the central 
values as in Table 7.5. The actual process of making a correlation 
table may be carried out in two ways. The first is to mark the 
position of each observation in the table by means of a dot or a 
stroke and finally to count the marks. This, however, is only suitable 
for fairly small samples, for if one makes a mistake or loses the 
p]p::e, there is nothing for it but to start again; further, the only 
means of chedking the accuracy of the table is to repeat it. Tlie 
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second method is to write the values of the characters of each indi- 
vidual on a separate card, and to sort the cards ; the first sorting is 
into groups according to one character, and then each group is re- 
sorted according to the second character. It is then an easy matter to 
look through each pile of cards to see that none is out of place, and 
finally to count them and enter the numbers in a table. The writing 
of the cards takes some time, but it is a saving in the long run. 



Fig. 8. — Scatter Diagram. 


particularly if there are more than two characters ; it is often possible 
to collect the data on cards and so avoid copying. 

If the observations are few, a correlation table is not suitable, and 
its place may be taken by a scatter diagram. This is just an ordinary 
graph in which x and y are the two variates, and points on this 
represent observations; Fig. 8 is an example, and shows the 
relation between length and fineness of the hairs of a number of 
varieties of cotton.^ If the relationship between the two variates 
is exact, the points in the scatter diagram lie on a smooth curve, 
while as the relationship weakens, the diagram more and more 

^ * Data by Morton (1926). 
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TABLE 7.1 — continued 

Length and Longitudinal Girth of Eggs of the Common Tern 

r = -h o ' S9 

{Data from ^‘A Co-operative Study 1923 ) 


Length in cm. 


4*10- 

4 * 15 - 4 * 30 - 4 * 25 -- 

4.30- 

4 * 35 “ 

1 

4 * 40 ' 

I 

■ 4 - 45 - 

4 - 50 - 4 * 55 - 

4 *60— 4 '63- "totals 

0 

123 

4 

5 

6 

7 

8 9 

10 IZ 



— — — — — — Z 

— — — — — — — — — Z 


JO 

I _ — — — — — — — — '35 

I — — — — — — — — — — — 38 

10 I I — — — — — — — — — '79' 

25 8 3 I 2 I — — — — — — 90 


37 

29 

II 

5 



I 






119 

38 

25 

^3 

13 

5 

I 

I 

I 

— 

— 

— 

— 

1 14 

10 

25 

29 

21 

9 

2 

— 

— 

— 

— 

— 

— 

lOI 

1 

5 

II 

20 

27 

22 

9 

I 

2 

— 

— 

— 

— 

98 

T 

2 

8 

15 

30 

12 

5 

I 

I 

— 

— 

— 

76 

— 

— 

2 

9 

25 

16 

10 

2 

— 

— 

— 

— 

66 

— 

— 

— 

— 

9 

10 

15 

II 

1 

I 

— 

— 

48 

— 

— 

I 

— 

4 

2 

5 

4 

6 

6 

— 

— 

29 






I 

2 

4 

5 

I 

3 

I 

I 

I 

3 

16 

6 

— 

— 

, — 

— 

— 

— 

— 

— 

I 

I 

— 

I 

I 

- 3 

I 

128 

lOI 

98 

91 

106 

54 

40 

25 

15 

12 

2 

5 

955 



METHODS OF STATISTICS 


resembles tiie result that would be obtained by sprinkling the points 
evenly with a pepper-pot; the former type of diagram is usually 
the experience of the physicist. The scatter diagram and correlation 
table are equivalent, except that in the latter the area is divided 
intp a number of rectangles, and the number of points in each one 
counted and substituted by the numerals. It is well to choose the 
scales of a diagram so that the range of variation is about the same 
in both directions, and the diagram is nearly square. 

The Coerelation Coefficient 

7 . 2 . Correlation tables may show all degrees of association from 
the obrious one between the length and girth of eggs in Table 7.1 
to the inappreciable one between head length and reaction time to 
sight in Table 7.5. This quality is measured by the correhtum 
coefficient (r), and the corresponding value is given with each of 
Tables 7.1 to 7.5; it is small when the association is weak, and 
large when the relationship is fairly close and strong. We shall now 
proceed to discuss the meaning of this coefficient and related con- 
stants, postponing a description of the methods of estimation and 
computation to a later part of the chapter. The correlation coefficient 
may be regarded from three points of view, discussed in the following 
sections headed Frequency Surfaces, Regression Lines and Analysis 
of Variance. 

Frequency Surfaces 

7 . 21 . Just as a single frequency distribution may be represented 
graphically by a histogram, so a correlation table may be represented 
by a similar figure in three dimensions. The base is divided into a 
number of rectangles representing the cells of the table, by a number 
of lines pardlel to the x- and y-axes, and on these rectangles are 
rrised columns, proportional in volume to the frequency in the 
corresponding cell of the table. Such is an empirical frequency 
surface, and it must be noted that frequencies are measured by 
volumes. 

Such a surface, of course, is made up of step-like figures (rather 
reminiscent of the basaltic columns of the Giant’s Causeway in 
Northern Ireland) and shows irregularities; but as the size of the 
sample is increased the iiregularities disappear, and as the number 
of groups is increased the steps become smaller, until in the limit a 
smooth surface (analogous to the frequency curve for a single variate) 



TABLE 7.2 

Numbers of Pistils and Stamens in Ranuncultis Ficaria 

Late Flowers, r — +0*75 
(Data by Weldon, 1901) 
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results. The progress in devising systems of mathematical formulse 
to describe such surfaces is less than has been made for the single 
variate distributions, and methods based on the normal surface are 
applied to most distributions. The equation to the normal surface is 



N 


/x^ 


j- 




27Ta^y'\/ (i — r^) 


xy 

— 2r — — 


) 



where x and y are the two variates measured as deviations from their 
means, and o-y are constants equal to the standard deviations of 
the two variates, r is a constant equal to the correlation coefEcient, 
iVis the total number in the sample, and dfis the element of frequency 
in the range dxdy with its centre at (x^ y). If we cut this surface 
by any vertical plane parallel to either the x- or y-axis, the section is 

a normal frequency curve of standard deviation a/s/ 1 — or 

1 — Q shows the contours of surfaces in which corre- 

lation coefficients are o, o * 3 , o • 6 and o • 9, and the standard deviations 
of X and y are both equal to imity ; it will be seen that the surfaces 
all rise to a hump at the centre, but that they tend also to form a 
diagonal ridge which becomes narrower and sharper as r increases, 
as would be expected from the fact that the standard deviation of a 
sectional curve becomes smaller as r approaches xmity. The correla- 
tion coefficient can never be greater than unity, and as it approaches 
that value, the ridge tends to become a thin outline in the form of a 
normal curve running diagonally ; when r = o, vertical planes through 
the centre parallel to the x- and y-axes divide the surface into four 
equal quadrants. The correlation coefficient can be negative, but the 
only difference that makes is to cause the major axes of the ellipses 
to follow the other diagonal. 


Recession Lines 

7 . 23 . An easier way of treating a correlation table is to draw a curve 
relating the means of the arrays of x and y . This can be done in two 
ways; either we may find the mean of y for every array of x (i.e. 
referring to Table 7.1, find the mean girths when the lengths of the 
eggs are successively 3-55-3-60, 3-70-3-75, 3-75-3-80, etc., cm.) 
or we may find the rnean of x for every array of y (mean lengths 
when the girths are successively 9*80-9-90, 10 -30-10 *40, 10-40— 
10*50, etc., cm.). In Fig. 10 are given ffie array means for each of 




Figs. 9. — Normal Frequency Surfaces. 
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Fig. io. — Rbsiession Diagrams 











the sample were iacreased indefinitely. The simplest curve that 
can be drawn j and one which is often a sufficient approximation, 


is the straight line, and appropriate ones are drawn on Fig. lo. 
For the maximum and minimum temperatures the circles giving 


the mean y for values of x lie above the straight line at both extremes. 


TABLE 7-1 1 



suggesting that the straight line is not quite adequate ; however, this 
is typical of the sort of data to which this method of correlation is 
often applied as an approximation, and we will use the straight lines. 
The two sets of means give two straight lines, which become more 
divergent as the association decreases; indeed, the diagram corre- 
sponding to Table 7.5 shows the two lines to be practically perpen- 
dicular. The line which is least inclined to the x- 3 jds has x as the 


* The means of the extreme groups are naturally very inaccurate, being 
Ifflsed on very few observations, and it is usual to combine several such 
groups. For example, in Table 7.1, if we assume the central values of 
lengte to be 3*575, 3*625, etc,, cm., and those of girths to be 9-85, 9*95, 
etc., cm., we have the means in Table 7.1 1, with length as the independent 
variable. To combine the three readings, we find the weighted means of 
the kngth and girth. That of the length is 


3-575 + 2 X 3-725 -f 6 X 3‘775 

9 


3 ‘ 742 , 


and similariy that of die girth is 10*59. Points obtained in this way are 
siwwn by larger circle and dots in Fig. 10. 
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independent variable, and similarly the other, being more nearly 
parallel to the j;-axis, has y as the independent variable. 

The general equation of a straight line with x as the independent 
variable is 

Y=ax-Yh 

where a and b are constants and the capital letter Y denotes the 
value of the dependent variable given by the line, as distinguished 
from the actual value of an individual, denoted by y. The problem 
of estimating the most appropriate values of a and 6 to fit a line to 
any given data is analogous to that of determining estimates of 
parameters for fitting frequency distributions, and the general method 
of solution that is most in accordance with statistical theory is the 
method of least squares. This consists in choosing the constants a 
and h so that the sum over all individuals of the quantities (y — Y)^* 
is reduced to a minimum. A similar line may be obtained giving X 
in terms of y. The equations to the two lines are deduced in sec- 
tion 7.6 and may be written 

(Y — y) = r^{x — X) .... (7.1) 

(X — x) = r~(y — y) .... (7.11) 

where x and y are the two grand means, and cry are the two 
standard deviations, and r is the correlation coefficient. The value 
of Y given by equation (7.1) is an estimate of the mean value of 
the array corresponding to a small sub-range about any given value 
of X, and that of X given by equation (7.1 1) is similarly the mean 
of the array corresponding to a given value of y. Equation (7.1) 
represents the line that is more nearly parallel to the :c-axis; both 
lines pass through the point (x, y). These are called regression lines 
and the constants 



and 



• In a diagram, these are the distances of the points from the line measured 
in a direction parallel to the y-axis. They are not perpendicular distances 
from the line. Further, these points represent individuals, and not array 
means. If array means are used, a weighted sum of (ys “ is reduced to 
a m i n i m u m and a similar result is obtained. 
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are called regressiam coefficients. The coefficients are the tangents of 
the angles the two lines make with the x- and y-axes respectively. 

If r = o, equation (7.1) gives (F — y) = o, an equation which 
is satisfied by a line parallel to the a*-axis, and going through the 
mean, y; similarly equation (7.11) gives {X — x) = o, and this is 
satisfied by a line through the mean, x, and parallel to the y-axis; 
the two lines are perpendicular. The meaning of these two lines is 
that the mean value of y is the same for all values of x, and that 
the mean of x does not change with y, or in other words, that there 
is no association betw^een x and y; and thus we see that r is zero 
for zero association or independence. If r = i, one regression 
coefficient is equal to the inverse of the other, and the two lines 
coincide. This is the condition for perfect association, i.e. when x is 
uniquely determined by y (and vice versa) and is the state of affairs 
at which the physicist aims when he controls his experiment so that 
only the two factors vary and there are no extraneous disturbing 
influences. The biologist usually has to deal wdth materials subject 
to variations over which his control is limited, and consequently he 
obtains the kind of result illustrated in Tables 7.1 to 7.5, where 
varying strengths of relationship are shown. If these tables are 
studied with Fig. 10, it will be noticed that the stronger the relation- 
ship, the greater is r and the closer together are the two regression 
lines. If or if the diagrams are plotted on such a scale that 

units of 

X — X , y — y 

and 

are equal, the tangent of the angle between each regression line 
and the axis of the corresponding independent variable becomes 
equal to r. If r is positive, the slopes of the lines are in the direc- 
tion indicating that x increases with y, while if r is negative the 
slopes are in the opposite direction, showing that x increases as y 
decreases. 

Anafysis of Variance 

Tlie preceding section, dealing with regression, pays attention 
to the relationship between x and y, regarding the deviations rather 
as errors due to uncontrollable factors; we will now consider the 
ojrrelatiiM table from the point of view of the variability. Adopting 
the methcKi of the last chapter, we can analyse the variance of say 
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girth (Table 7,1) into two portions, one associated with differ- 
ences between the array means for the various groups of length, 
and the other with residual deviations within the arrays. This can 
be done by finding the individual array means, as was done in the 
last chapter, and calculating their variance and the variance of the 
residual deviations from them. For Table 7.1 we should combine the 
first five and the last two, giving 18 arrays, or 17 degrees of freedom 
on which to estimate the variance between arrays, and 937 on which 
to estimate the residual. If, however, we are satisfied that a straight 
regression line like equation (7.1) sufficiently represents the trend, 
the value Y of the girth given by this line for various central values 
of the length (;«) may be used instead of the array means For 
Table 7,1 the line of regression of j on (as computed in section 
7.5) is 

(y— 11-378)= 1-756 (* — 4-190); 

when *=3’S75, Y — 11-378 = — 1*080 and F= 10-298, 
when « = 3*725, Y— 11-378 = —■ 0-817 Y= 10-561, 

and so on. For the sxim of squares of the deviations of the array 
means from the grand mean (11*378) we take i X (— i*o8o)^ 
+ 2 X (— 0-817)^, etc., adding these quantities for all arrays; for 
the squares of the residual deviations from the array means we take 
(9-85 — 10-298)2 + (10*35 — 10-561)2 + (11-05 — 10-561)2, etc., 
adding these quantities for all cells in the table, while for the total 
sum of squares we use the marginal totals as usual. These three 
sums may be foimd and entered in a table of analysis of variance 
in just the same way as if we had used the actual array means. 
This process can be carried out explicitly as indicated, and the 
reader is recommended to do it as an exercise ; alternatively, all the 
various sums of squares may be found from some of the constants 
of the table as shown below. We will set out the process algebraically. 

Repeating equation (6,6), section 6.4, except that the variable is 
changed to y, we have for the first method of analysis, 

SS'(y — yf=SS’(y—ysf+Sn,{y, — yY, . . (7 a) 

and for the second method, using the Y given by equation (7.1) 
instead of we have 

5(y-3?)2 = 5 ev- y)« + 5(y-5?)2, . . . (7.3) 
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where S = SS' is the summatiori over all individuals. This may 

be expressed in ivords — the sum of squares of deviations from 'the 
grand mean equals the sum of squares of deviations from the regres- 
sion line plus the sum (over all observations) of squares of deviations 


TABLE 7.6 


Analysis of Variance of y (Egg Girth) 


Source of Vaxiatioii 

Sum of Squares 

Degrees of 
Freedom 

Variance 

[ 

^ Straight regression line 

SiY-yY 

= r^S(y — yY 

I 

yVs 

; Residual 

S(y- Y)2 
= (i — j^S(y — yY 

N~z 

yVr 

Total . . 

; 

Siy — yY 

1 

w 

1 

y^T 


of the regression values from the grand mean. The terms of equation 
(7.3) are entered in Table 7.6, and it is shown in section 7.6 that 
the sums of squares are severally equal to those terms in the table 
which involve r^. The constant r is again the correlation coeflScient. 


TABLE 7.61 

Analysis of Variance of x (Egg Length) 


Scmrce of Variation 


Smn of Squares 


I 

; Straight regression line 


Residual 


Total 


Six - 

= T^Six — 2C)^ 

Six - 

= (l — 7^)Six — xY 
Six ~ 


Degrees of 
Freedom 


Variance 


iV-2 


S^T 


N -1 




^ — I 

If the reader has found the sums of squares directly as suggested in 
the last paragraph, he will also be able to check these relations. 
Similarly, by using the regression line of equation (7.1 1) we may 
analyse the variance of x and amve at the sums of scuares in 
Table 7.61. 
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We shall now discuss the degrees of freedom. We have seen that 
in an infinite population for which ^ and y are independent, the 
regression of j on a; is a line through the mean, y, and parallel to 
the ;r-axis; it follows that for such a case (y — Y) equals (y ~ y) 
and the residual sum of squares equals the total. For a finite sample, 
however, owing to random errors a regression line will have a slight 
slope and the residual sum of squares will be different from the 
tot^. Moreover, since the regression line is obtained by the method 
of least squares the residual must always be less than the total. It 
follows from a proof given by Fisher (19256) that if the total sum 
of squares has n degrees of freedom, the amount of this difference 
is, on the average, one 72th of the total sum of squares, and that the 
residual sum of squares divided hy n — i is an unbiassed estimate 
of the variance of y in the population, having the sampling distribu- 
tion of an estimate based on n — i degrees of freedom.* Applying 
this result to Table 7.6 and remembering that there are N — i 
degrees for the total sum of squares, we see that when the asso- 
ciation in the population is zero, the three variances in the last 
column are estimates of the same variance, that of 37 in the population. 

Thus, for zero association in the population. Table 7.6 is analogous 
to Table 6.3 with two arrays (one degree of freedom). If, now, there 
is an association between x and y, the difference between the residual 
and total sum of squares is greater than the proportion due to one 
degree, yVj, is less than yV^ and y% is greater than either. We may 
carry the analogy between Tables 6.3 and 7.6 further by regarding 
y% as the variance due to the regression line; we shall make no 
attempt, however, to find an andogue to o-f in Table 6.3. The 
value y% is an estimate of the true variance within arrays, and also 
of the variance of the distribution represented by a cross-section of 

* This is a special case of a more general result that if to a series of N 
independent observations an equation or series of equations involving u 
constants altogether is fitted by the method of least squares, the sum of 
squares of the deviations from the values given by the equations, i.e. the 
residual sum of squares, divided by iV — « is an unbiassed estimate of the 
variance in the population. Thus, we may say that each constant so deter- 
mined from the data absorbs one degree of freedom. The one degree of 
freedom associated with the linear regression line is due to the slope, 
other constant, y, is eliminated from both the residual and total sum of 
squares. This result also follows from a proof given by Irwin (1931). 

This property is a powerful recommendation for the use of the method 
of least squares in statistical work. 
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the frequency surface in a vertical plane parallel to thej^-axis. Similar 
arguments may be applied to the analysis in Table 7.61. 

It is thus seen that the' two straight lines found by least squares 
are those which make the residual variances a minimum (or make the 
variance associated with themselves a maximum), and r is a measure 
of the amount of the total variation in each quantity associated with 
the appropriate line. If r = i, the residual variance is zero and all 
the variation in one quantity is explained by variation in the other, 
while if r = o, the residual variance is as great as the total, and there 
is no association bet^^een x and y. This point of view (since it uses 
r^) takes no account of the sign of r, but is merely concerned with 
measuring the strength of association without troubling whether y 
increases with x or whether it increases as x decreases. 

General Discussion 

7 . 3 - The correlation coefficient is much used as a measure of the 
strength of association, but it is not easy for the beginner to appreciate 
its scale of values. It may vary between plus and minus unity, but 
the sign merely shows the direction of the trend (whether x increases 
with y or whether one increases as the other decreases), and so as a 
measure of association we are only concerned with values between 
o and i; a correlation of — 0-5 is as good as one of + 0*5. If we 
consider samples in which N is so large that for all practical purposes 
iV — I = JV — 2, we see from Tables 7.6 and 7.61 that vjvx, the 
ratio of the residual to the total variance, is practically equal to 
(i — r^), an d that the ratio of the corresponding standard deviations 

is These quantities are given in Table 7.7, and the meaning 

of any value of the correlation coefficient between two quantities is 
that if we keep one quantity constant, the variability of the other 
is reduced in the ratio given in the third column of the table. The 
scale is very uneven, since a correlation of o*6 reduces the residual 
standard deviation to 0*8 of the total, while even if the correlation 
is as high as 0-9, the residual deviations have a standard deviation 
of 0-436 of the total. This unevenness of scale is shown by Tables 
7.1 to 7.5; a far greater change in association is noticeable between 
Tables 7.1 and 7.3 for a smaller change in correlation coefficient 
than t^tween Tables 7.3 and 7.5; i.e, a coefficient of i-o is much 
more than twice as good as one of 0-5. Attempts are sometime 
made to explain correlation in terms of chances, and to state that 
if the coefficient for two characters is o-6 (say), and knowing one 
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of them we use the regression formula to guess at the other, we shall 
be right six times in ten trials. Such an explanation is nonsense, 
and the truth is that the standard error of our guess will be reduced 

to V" I — 0*6^ = o * 8 of what it would have been had we not used 
the formula. Such a reduction in standard deviation may be further 
interpreted in terms of reduction in frequencies of estimates deviating 
from the actual values by given amounts, on referring to section 2.52. 

The following typical cases may assist the reader. Yule (1927) 
gives the correlation coeiEcient between the ages of husbands and 
wives in England and Wales as 0-91, and that between the age and 
the standard of elementary school boys is of the same order (Jones, 


TABLE 7.7* 


l- ■ 

Correlaticm 

Coefficieiit 

* w 

Ratio of Variances 
Residual/Total 

Ratio of Standard 
Deviations 

cJCj. 



zero 

I -GO 

I -ooo 

zero 


0*2 

0*96 

0-980 

0*20 


0*4 

0*84 

0*917 

0*42 

1 

0*6 

0*64 

0*800 

0*69 


0*7 

0-51 

0-714 

0*87 


0*8 

0*36 

0*600 

1*10 


0*9 

0*19 

0*436 

x-47 


I -o 

— - 

zero 

zero 

infinity 



* The column under z' will be referred to in the next chapter. 


1910). The likeness between parents and children is expressed by a 
coefficient of about 0*47 as far as height and several other physical 
characters are concerned (Snow, 1911), and that between cousins or 
uncles and nephews, which is so small as scarcely to be noticeable 
to the ^‘man in the street,” has a correlation coefficient of about 0-26. 
Similarly, taking periods of six days as units, the correlation between 
the rainfall and hours of bright sunshine in Hertfordshire is negative 
(Le. increased rainfall is associated with decreased sunshine) and 
about 0-2. 

TTie uses (and abuses) of the coefficient of correlation are many, 
but it noay be always safely regarded as a constant descriptive of the 
properties of the sample. For instance, Weldon found that the corre- 
lation between number of pistils and stamens of late flowers of 
Table 7^^ was 0-75, while for the early ones, Table 7.3, it was 0*51 ; 

L 
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tliat change in the strength of relationship is as much a characteristic 
of the flowers as a change (say) in mean height of the plants. Fisher 
and HoMyn (192S) give tables to show’ that the correlation between 
maximum and minimum temperature is about 0-71 in January and 
o • 30 in August ; this change is gradual from month to month and is 
a quality of the climate. 

If a linear regression formula is being used to predict one 
character from another, the correlation coefficient may safely be used 
to define the accuracy of the prediction as has been showm, the 
standard deviation of the difi’erences between actual and predicted 

values being 1 — r- times the standard deviation of the dependent 
variable. 

A rather more dangerous use of this constant, however, is as evi- 
dence of causation. If tw’o quantities are associated (as shown by the 
correlation coefficient) the inference often made is that one is a 
cause and the other an effect. Such an inference is often erroneous 
w’hen dealing with quantities susceptible to such close control that 
the correlation coefficient is unity, but it is particularly unsafe when 
there are uncontrolled variations and the relationship is not exact. 
Often two quantities are both affected in the same way by a third 
so that they appear to be related, w’hen actually neither if altered 
independently would have any effect on the other. For instance, 
Yule refers to the fact that the proportion of marriages con- 
tracted outside the Church of England has for many years been 
increasing, while the average age at death has also been increasing, 
and there is a positive correlation; but no one supposes that there is 
a causal relationship and that a law^ prohibiting the solemnisation of 
marriages in churches would have the effect of improving the lon- 
gevity of the nation. When investigating causation, it is usually well 
to decide first on other groimds that a causal relationship between 
two factors is likely, and then to conduct a close analysis of other 
po^ible factors before using the correlation coefficient as evidence. 
Care, common sense, imagination, and a technical knowledge of the 
subject to which it is applied are particularly necessary in this use 
of correlation. We shall in Chapter XI discuss methods of analysing 
data for, and eliminating other factors. 

When considered as the expression of the relationship between 
two quantities (the second method of approach above), the correla- 
tion coefficient merely measures the importance of the variations 
in one quantity associated with the other (the independent variable) 
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relMwe to all other variations^ and if the variance of the independent 
variable can be controlled, either experimentally or by selection, 
the correlation can be altered. It is thus not a physical quantity in 
the way the regression is. To demonstrate this, we will suppose the 
independent variable to be the dependent one the regression a, 
the correlation coefficient r, and the standard deviations and Gy. 
Then if is the standard deviation of the residual deviations of y 
from the regression line, which we may suppose to be constant for 
all values of 

cTj. — GyV 1 — r\ 

But 



Now G^ and a are assumed to be constant, so that r is not inde- 
pendent of G^. For the relationship between numbers of pistils x and 
stamens y,r = 0*749; = 3*388, Gy = 3*298, g^ = 2-185, 

a = 0-729 and og^ = 2-470. If now we had selected flowers so that 
the standard deviation of the number of pistils had been reduced by 
one-half, we should have reduced the correlation coefficient from 
0*749 0*492, and we should have increased it to 0*914 if we had 

been able to increase the variation of pistils so as to double the stan- 
dard deviation (these results may be obtained by substitution in the 
above formula). The same consideration applies when the effect of 
some factor like temperature on some variable quantity is being 
investigated experimentally; by altering the range of temperatures, 
the correlation coefficient may be altered considerably, and so it is 
a result of the arbitrary experimental conditions. When such con- 
ditions are imder the control of the experimenter, the amount of 
variation in x should be made as great as possible so as to make r 
large; we shall see in the next chapter that this gives a maximum 
precision to the measurement of a. A low correlation coefficient 
does not necessarily mean that x is incapable of having an important 
influence on 3^ ; it may mean that an insufficiently large range of x 
has been tried. Or if x cannot be varied, it may be that although 


whence 


r2 _ 
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its effect or y as measiired by the correlation coefScient is small 
relative to disturbing factors, its absolute effect as measured by the 
regression c : rfncicr t is important. The correlation coefficient between 
the yield of some crop over several plots and quantities of fertiliser 
used may be low, but if the regression is appreciable, the total 
advantage of the use of fertiliser by the whole agricultural industry 
will be considerable. In such instances, except as an indication of 
the existence of association, the coefficient of correlation is an un- 
suitable constant, and that of regression is more suitable. 

Those w'hose previous outlook has been physical often prefer to 
regard the scatter of points on a scatter digram as being due to dis- 
turbing factors akin to experimental errors, and as obscuring an exact 
physical relationship w'hich is assumed to xmderlie the data. To such 
people the regression obtained by the method of least squares 
appears to represent the best possible approach to the true relation- 
ships, and when they realise that there are two regression lines, they 
are puzzled to know which to take. Such an interpretation of a scatter 
diagram or correlation table may be a little dubious, and in any case, 
without additional knowledge, it is not possible to arrive at a theo- 
retically satisfactory unique line. It is safer to regard all the features 
of the table — including the scatter — as equally real properties of the 
sample, and to regard the means, variances, correlations and regres- 
sions as so many constants descriptive of some of those properties. 

To sum up; there are three important constants that express the 
properties of a correlation table : 

(1) The correlation coeflficient that measures the importance of 

the variation in y associated with x relative to the total 

variation, 

( 2 ) The regression coefficient that measures the average amotmt 

of increase or decrease in y per unit increase in and 

( 3 ) The residual variance that measures the scatter of values of y 

about the regression line. 

For different populations, these constants are independent in that 
a high value of one constant does not necessarily mean a high or 
low value of either of the others. 

Estimation of Coefficients 

7.4* TTie method of ma x i m um likelihood may be applied to deter- 
mining estimates of the parameters of the normal frequency surface 
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of section 7.21. For x, y, and Gy it gives the estimates already 
deduced for single distributions and for r it gives 

_ S{x-^{y-y) _ . _ (^,4) 

's/ S{x — xYS{y — yY 


where S is the summation over all individuals. 

To be consistent with the earlier treatment of frequency distribu- 
tions we should denote the population value by p and describe the 
above value of r as an estimate obtained from a sample. 

From equations (7.1), (7-11) and (74) we have for the regression 
coefficients 


( 7 - 5 ) 

{7-51) 

If JV is the size of the sample, 

5 (a: — x){y — y) 

N 


_ . , Oy S{x — x)(y — y) 

Repression of y on x — r— = — 

* S{x — xY 

and 

T> ■ X S{x — x){y—y) 

Regression of x on y = r— = — n: 

^ S{y-yY 


is called the first product moment (= p, say), whence we see that 

N p 

“ (JV - I) • 

where g^ and Gy are calculated correctly on iV — i degrees of free- 
dom; if the sample is large JV/(JV - i) may be taken as nearly equal 
to unity. 

For moderately small samples, r may be calculated directly from 
the individual observations, using equation (7.5), but when the sample 
is large and the data are grouped, a rather more elaborate method of 
computation, such as is illustrated in the next section, is advisable. 



Computation of Coefficients 

7 , 5 . The denominators of the above equations may be computed 
by the methods of section 1.31 or 6.3, and here we need only concern 
ourselves with the product term in the numerators. It will be con- 
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venient to use similar methods. Corresponding to equation 
p. 37, we have 





and corresponding to equation (6.5), p- 13 have 


S{x — x)(y — y) = Sxy — Nxy = Sxy — 


T^Ty 

N 


(7.61) 


where x and y may be measured from any convenient origin in the 
true units. The correcting term in the above equations may be 
positive or negative, and if either or j? is zero, this term is zero. 

When the observations are grouped, all those in any group may 
be assumed to be at its centre and it is convenient to measure the 
deviations from a group near the middle of each distribution in 
terms of the sub-range or hy. Then if is the product moment 
in such units, 

P = (7.63) 

Sheppard’s corrections may be applied in calculating the denomi- 
nators of equations (7.4) and (7.5) from grouped data; there is no 
corresponding correction for the numerator. 

We shall compute the constants for Table 7.1 as an example, 
where the arbitral*}^ values x' and y' are given in the colunm or 
row adjacent to that containing the true values of the variate. The 
computations are all contained in Table 7.8, where columns (i), (3), 
(3), (4), (7), (8), (9) and (10) contain all the data for obtaining the 
means and the second moments, and below the table all the calcu- 
lations have been carried out as in the first chapter. Sheppard’s 
corrections have been applied, and as the sample is a large one, the 
second moment has been obtained by dividing the sums of squares 
by the number of observations (995) instead of by the degrees of 
freedom. 

To find the product moment, we must first find Sx'y\ This could 
be done directly by writing in the comer of each cell in Table 7.1 
its xy (the one in the top left-hand comer would be — 14 X — n 
+ 154, the next in the top row would be — 14 X — 10 = + 140, 
‘and so on), and then multiplying each x'y;' by the number in the cell 
and adding; in this table, all the squares in the two quadrants 
containing m<^t observations would have positive products, and those 
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in the other two would have negative ones. A better method, how- 
ever, is to do the summation in tw^o parts, keeping y* constant first, 
summing all the in each array of y' separately, and then adding 
the arrays. If we consider any array of y' with tiy obser\^ations: 

. (7.63) 

s s 

where S, S and S' are the summations used previously, and Xy 

is the mean x' of all the observations in the sth array of y'. 

The process is carried out in columns (5) and (6) of Table 7.8. 
In the array y' — — 14, Xy is — ii/i, so HyX' = — ii ; y = — g, 
Xy= — 12/2, so UyXy = — 12, and so on for the other quantities in 
column (5) ; the sum of this colunm is the sum of all the deviations of x' 
from the arbitrary origin, and so should equal the sum of colunm (9). 
The terms of column (6) are the products of those in columns (i) 
and (5), and their sum is the required Sx'y\ As a check on the 
arithmetic, this may be done the other way, summing first for each 
separate array of x' and adding them. This is done in colximns (i i) 
and (12), and if the arithmetic is correct, the totals of columns 
(ii) and (3) and those of columns (6) and (12) should be equal. 
There is no independent check of columns (4) and (10), and so they 
should be repeated carefully. 

The product moment is calculated below Table 7.8 by applying 
equation (7.6) to the arbitrary values and then correcting by equa- 
tion (7.62). In finding r and the regressions, we have used p, 
and Oy in the natural units of centimetres, but it would be just as 
easy to maintain arbitrary units throughout and then to correct the 
regressions by multiplying by kyjk^ and hjhy. The correlation 
coefficient, being a pure number, needs no such correction. 

Regression Lines and Sums of Squares: Proofs 

7 . 6 . Let X and y be the individual values of the variate with means 
X and y; then it is required to find the constants a and h of the 
following equation : 

Y = ax b, . . . . . (7.7) 

such that S[y — YY is a minimum. 

From (7.7) 


S{y-Yf=S{y- ax-- b)\ 
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mean girth = ii-as +0-128 = 11 378 cm 

, 12 224 

z — = 12 ’800 o 

955 

- yv/2 = -> I -648 I 
y^i = 11*151 9 

— 0-083 3 
yMz = 1 1 -068 6 


2 


ay" = 
av = 


P' = 


II 119 
955 


Ay X II -068 6 
0-332 70 cm. 

1 1 * 642 9 


= O' no 69 cm.= 


~ — ~ I ‘668 2 

= +9'974 7 

P = My >1 9-974 7 cm.» = + 0-049 874 cm.* 
r = + 0-049 874 , 

0-33270X0-16854 +0-889 
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TABLE 7.8 — continued 


m 

(8) 

m 

am 

im 

^ xy'x 

(12) 

nxy ' xX ' 

— 1 1 

1 

— 11 

Z2I 

- 14 

354 

— 10 


___ 

— 

— 

— 

- 9 

— 

— 

— 

— 


- 8 

2 

- 16 

128 

~ II 

88 

- 7 

6 

“ 42 

294 

- 35 

24s 

— 6 

14 

- 84 

504 

- 75 

45c 

- 5 

16 

— 80 

400 

- 59 

295 

- 4 

30 

— 120 

480 

— 105 

420 

- 3 

55 

-165 

495 

— 126 

378 

— z 

69 

-138 

276 

— iig 

238 

“ I 

85 

- 85 

8s 

- 70 

70 

0 

128 

— 

— 

+ 25 

— 

1 

lOl 

-f lOI 

lOI 

106 

106 

2 

98 

196 

392 

185 

370 

3 

gi 

273 

819 

240 

720 

4 

106 

424 

I 696 

414 

I 656 

5 

54 

270 

I 350 

241 

I 205 

6 

40 

240 

I 440 

215 

I 290 

7 

25 

175 

I 225 

147 

I 029 

8 

15 

120 

960 

III 

888 

9 

12 

108 

972 

91 

819 

10 

2 

20 

200 

17 

170 

1 1 

5 

55 ' 

605 

48 

528 

Totals 

955 

+ 1 241 

12 543 

-hi 226 

-hii 119 


= X = 


+ I 24X 


— + 1-29948. 


= 0-05 cm. 


mean length = 4*125 + 0-065 = 4' ^9® cm* 


xV^ = 


12 543 
955 


= 13*1340 


— ^1^'* — _ I -688 6 

*^2 = 11*445 4 

— 0*083 3 

= II *362 I 

CFx ^ = hx ^ X 11-362 I = 0*028 405 cm.* 
<Tx = o-i68 54 cm. 
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and differentiating this with respect to a and b respectively, and 
equating to zero (the usual method of evaluating constants to reduce 
a function of them to a minimum), w’e have 


— 2Sx{y — ax — 6) = o, 
— 2S{y — ax — 6) = o ; 


or expanding term by term, and cancelling out common terms, 


Sxy = aSx^ -r hSx^ 1 

Sy — aSx -j- hSi . 1 



Now Sx = Ax, Sy = Ny and Si = N, and using these we may 
solve (7.8) to obtain 

Sxy — Nxy 
SxP-~ Ni? , 

and 




We have seen that Sxy — Nxy — S{x — x){y — y) and Sx^ — 
=^S{x — £f] using these in (7.81) and substituting in (7.7) we 
obtain 


S{x — x)(y — y) , 


(7-71) 


which reduces to regression equation (7.1), when the value for r 
given in equation (7.4) and the standard deviations are substituted 
for the sums of products and squares. 

Rewriting equation (7.3) by substituting the regression value of 
equation (7.1) for F, we have 


S{y --yf~S I (j — j) — r~(x ~ x) I + r^^—^S{x — x)^. (7.9) 

K cr^ } (T^-^ 

Now 


S(x — x)^ 


and the second term on the right-hand side of equation (7.9) becomes 


r^S{y — 
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Also expanding the first term on that side and substituting for t 
from equation (7.4) we find that the term reduces to 

(i - i^)S{y ~ yf. 

Equation (7.9) may thus be written 

S{y — ffi = (i — r^)S{y — yf + r^S(y — yf, . (7.91) 


and these are the siuns of squares entered in Table 7.6. 

Further, by substituting for r in equation (7.91) we see that the 
sum of squares of the deviations of 3/ from the regression line is 





[.?(« — x){y— y)f 
Six — xf 


(7-92) 


We shall use this in subsequent chapters. 

By a similar series of calculations, interchanging x and 3;, we can 
obtain the regression line and sums of squares for jc. 



CHAPTER VIII 


sa:mplixg errors of correlation and 

REGRESSION CONSTANTS 


Like all other statistical constants, the correlation coefficient is 
subject to errors of random sampling, so that different samples 
from the same population give varying values which themselves 
form a frequency distribution. In this chapter we shall deal with 
this distribution and with tests of significance based upon it, and 
with the sampling errors of regression constants. 

Populations with Zero Association 

8 . 1 , One of the commonest tests is to see if an observed correlation 
coefficient is significantly greater than zero, and to do this, the prob- 
ability that such a value could arise in a random sample from a 
population having no association has to be found. For samples 
from such a population, the standard error of the distribution of r is 

± i/VAT— I (or for large numbers, ± i/ViV is an adequate 
approximation) where N is the number of individuals, i.e. the 
number of pairs of readings of x and y. Unless the sample is very 
small, the distribution is near enough to normality to justify the use 
of the criterion that a value of r more than twice the standard error 
for zero association is above the 0*05 level of significance. For the 
data of Table 7.5 there are 4 690 observations, and the correlation 
between reaction time to sight and head length is — 0*034, with a 
standard error of 0-015. The value is a little more than twice its 
standard error, and so is suggestive of a real, although exceedingly 
weak, association. It would not be safe, however, to deduce very 
much from such a result; indeed, a correlation of that sort might 
conceivably arise if there were a small personal error in measurement 
such that the observer who overestimated the head-length tended 
slightly to underestimate the reaction time, and if several observers 
took the measurements, the same one always measuring both char- 
acters on the same individual. For the maximum and TninirmTm 
temperatures of Table 7 - 4 > N = 1 518, f = 0*30, and being more 

than ten times its standard error 

( ± - > = ± 0-026 ) 

V Vi 518 J 




coefficient on different levels of significance for samples of various 
sizes. When there are iz degrees of freedom (N = 14), r — 0-532 4 
lies on the 0-05 level, and making the above transformation, we 
find that z = 0-778 8 and t — z* 179. Both of these lie on the 5 per 
cent, level of their respective distributions when n^ — 1 and xz 
or n= 12. We can now check the adequacy of the assumption of 
the normal distribution of r for moderately large samples; when 

N = zo (say), i/ViV — i = 0-229, ^ correlation coefficient of 

0-46 lies on the 0-05 level, while Fisher’s tables give the exact 
value for 18 (= AT — 2) degrees of freedom as 0*444; when AT = 10 
the corresponding values are 2 X 0-333 = 0-67 for the normal 
distribution and o - 632 for the true distribution, and at iV = 5 they 
are 2 X 0-500 — i-o and 0-878. As far as the 0-05 level of signifi- 
cance goes, the assumption of normality is not likely to lead to 
serious error even for samples as small as 10, but below that it 
makes the test unnecessarily stringent; the true test based on z or ty 
however, is correct for samples of all sizes. 

Contrary to an opinion often expressed, an association measured by 
the correlation coefficient may be significant in a very small sample 










points whidi lie anywhere near a straight line does not say that there 
is no relationship but assumes one and draws his line. This is the 
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more extreme case in which the association is high, and the coirektion 
coefficient is a device for measuring and testing the association more 
objectively and exactly, and is particularly useful in border-line cases. 

We shall take as example some data by Winter (1929), for which 
com tvas grown for twenty-nine years in two series of plots ; in the 
first, the seed for one year was from ears of com in the previous 
year’s crop in that series, selected because it had high protein 
content, and in the second the seed for one year wras selected from 
the previous year’s crop because of low protein content. The data 
given in Table 8.1 are the mean percentages of protein for each 
year, and the coefficients of variation per cent, from ear to ear, and 
we will call them y2> ^1 The mean protein percentages of 

the first series (yj seem to have increased progressively, and those 
of the second series (yg) appear to have decreased, w^hile it is difficult 
to see w^hat tendency the variabilities show; we will investigate these 
trends by means of the correlation coefficients, thus assuming them - 
to be sufficiently well expressed by a linear relationship with time. 
The correlation coefficients between time and y^, yg and are 
+ 0*862, — o*o8i, —0*708 and +0-256 respectively.* For 27 
degrees of freedom, r = 0*367 lies on the 0*05 level of significance 
so that on the average the mean percentages of protein have increased 
significantly in the first series and decreased in the second with time, 
but the continued selection has had no measurable effect on the 
variability. 

Populations in wtiich Association is not Zero 

8 . 2 . When the association is not zero, the standard error of the 

correlation coefficient is 



Vn-^ 


where p is the correlation coefficient in the population ; but this is 
not of much use in testing whether one observed value is greater than 
another, because (a) unless p is small, the distribution of r, the 
correlation coefficient of the sample, is far from normal, and (i) the 
population value (p) is unknown, and if the sample value (r) is 
substituted the error is likely to be serious. These considerations 

* The computation is murii simplified if the year 1910 is chosen as zero 
time, and the others are labelled — i, — 2, . . , + i, + 2, . . theni= o. 
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become particularly important when p approaches o*8, for the 
sampling curve becomes exceedingly skew; for evidence of this, the 
reader is referred to J Co-operative Study (1917), in which very full 
distributions are worked out by K. Pearson and his colleagues from 
a formula developed by Fisher (1915).’^ 

These difficulties in the use of the sampling distribution of r may 
be o\'erc:mr by making a transformation suggested by Fisher (ipai), 
which consists in writing 

s' = tanh"^ r = 4{log, (i r) — log, (i — r)} . (8.3) 

This is not quite the same as the z found from the ratio of two 
variances, so we have added the dash to make a distinction. This 
quantity has a standard error of 


I 

ViWrj 

(i.e. it is independent of the value of 5:'), and although the distribution 
is not quite normal, it is nearly so, and for most tests, even on 
small samples, normality may be assumed. Fisher (1936) gives a 
table relating z' to r, and some of the values are entered in Table 7.7; 
when r = I *0, ar' = 00 , and the whole effect of the transformation 
is to give a more open scale for z' when the association is high, 
when, as we saw in the last chapter, small changes in r correspond 
to important changes in the variance absorbed by the regression line. 

Significance of D^erences between Correlations 

8 . 21 . Because its distribution is more nearly normal, Fisher recom- 
mends the z' transformation when testing the significance of the 
deviation of an observed correlation from zero ; but the effect of the 
transformation is small for small values of r, and this step is not 
important. Differences of observed correlations from some assumed 
population value, or between pairs of observed correlations, may be 

* It may be well to mention here a fairly common fallacy in the use of the 
standard error for testing the significance of an observed coefficient, r. As 

an ^proximation, (1 — i^/Vn ~ i is often written for the unknown 

(i — p^/VN — I and if any observed correlation is greater than twice the 
former standard error it is judged to be significant; but if the hypothesis 

being tested is that p = o, the true standard error of r is i/VN — 1 as 
shown in the previous section. 
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tested by using z' and its standard error just as we did the mean 
and its standard error in large samples. 

For example, Snow (1911) gives the correlation coefficient between 
brothers and sisters, measured for a variety of characters, as 0-521, 
while Fisher (1928) found from 34 pairs of unlike sex from triplet 
births the correlation coefficient for cubit to be 0-645; can this 
difference be attributed to random errors? The values of ^ are 
0-578 and 0-767, and the standard error of the second is 

= 4 : o-i8o 

V31 

(we assume the first determination to be the exact population 
value, since it is based on many more observations) ; the difference 
is less than twice the standard error and is insignificant. 

In this example, deviations of of 4: 0*360 from the population 
value (0-578) lie on the 0*05 level of significance, giving values of 
z' on this level of 0-218 and 0-938; the corresponding values of r 
are 0-214 and 0-734. Thus the chances are about 20: i against a 
random sample of 34 pairs giving values of r outside these ^ts, 
but we regard values anywhere between them as common experience; 
the deviations of the limits of r from o • 521 are 0-307 and + 0-213, 
and their inequality shows the skewness of its distribution. 

The following is an example of the use of z* to test the significance 
of the difference between two observed correlations. Table 7.2 shows 
the correlation between number of stamens and pistils measured on 
373 flowers collected late in the season to be 0-745, while a sample 
of 268 flowers collected earlier gave a correlation coefficient of 0-506; 
making the transformation, we find z' — 0-962 and 0-557, and the 
difference, being 0-405 with a standard error of 

/— + 4- = o-o8o, 

V 370 265 

is quite significant. 

The Combination of Estimates of a Correlation 

8 - 22 . It sometimes happens that we have a number of correlation 
coefficients estimated from several small samples from populations 
having the same coefficient, and that we wish to combine them to 
obtain a mean. This is best done by making the transformation, 

M 
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finding the mean and then re-transforming back to r. If the 
samples are not of equal size, we naturally give the larger ones more 
weight than the smaller ones; but the weights should be propor- 
tional, not to N, the size of the sample, but to (N — 3). Thus if 
zl, 5^2 .. . are the individual estimates based on samples of h\, iVg . . . 
pairs of observations, the weighted mean is 



■ -f {N^ — 3>2 + ■ - • 
~~ 3) "T i ^2 ~ 3) ~r • • • 


and the standard error of z' is 


ViN, - 3 ) + {N, - 3 ) + • • •' 

Table 8.2 contains the correlation coefficients between cephalic 
index and upper face form for samples of skulls belonging to thirteen 
races (Tschepourkowsky, 1905). The values of z' are in the fourth 
column, and their mean, when weighted with {N — 3), and standard 
error are 

— 42*408 . I 

= — 0*0609, and ± — 7= = it <^*037 9. 

V 696 

The mean z'’ is not significant. 

If an improved estimate of the correlation is obtained by com- 
bining a large number of very small samples in this way, there is a 
possibility of serious error in z' due to the fact that its distribution is 
not quite normal and there is a slight bias. For this reason, the 
number of coefficients combined should be small compared with 
the average size of sample. 

The Significance of Groups of Correlations 

8 . 23 . If we have a group of correlation coefficients, however, and 
wish to see if , as a whole, they show association, a test involving z' 

and may be used. Since the standard error of z' is i/ViV— 3, 
the quantity z'y iV — > 3 is distributed approximately normally and 

* In the language of section 3.7, Nx — 3, iSTj — 3, etc., are the quantities of 
information given by the samples. In finding the mean of z\ these quantities 
are the weights, and smce the quantities are additive, the variance of the 
mean is the inverse of the total quantity of information. 
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with unit standard deviation in samples from a population with zero 
association, and we may calculate 

= S(z'VN^)^ 

where S is the siirnniatioii over the number of samples available. 
As we have indicated in section 4.5, this should be distributed 
as the for g degrees of freedom if there are g samples, and we 
may find P, the probability that random samples from the 

TABLE 8.2 

Correlation Coefficienis between Cephalic Index and 

Upper Face Index 


1 

Race 

1 

r 

Number of 
Cases 

N 

Correlation 

Coefficdent 



r 

[ 

Australians . . 

66 

1 

"T 

0*089 

+ 

0*089 

0-50 

Negroes 

11 

t 

i 

0*182 

4 " 

O' 184 

2*51 

Ehike of York Is- 







landers 

53 

— 

0*093 

— 

0*093 

0-43 

Malays 

60 

— 

0*185 

— 

0*187 

1*99 

Fijians 

32 

+ 

0-217 

+ 

0-221 

I -42 

Papuans 

39 

— 

0*255 

- — 

0-261 

2-45 

Pol3ro.esians . . 

44 

4- 

0*002 

4. 

0*002 

0* 

.. Alfourous 

19 

— 

0-302 

, — 

0*312 

1-56 

Micronesians 

32 

— 

0*251 

. — , 

0-257 

1*92 

Copts 

34 

— 

0-147 

— 

0*148 

0*68 

' Etruscans 

47 

— 

0*021 

— 

0*021 

0*02 

Europeans . . . . ' 

80 

— 

0*198 

— 

0*201 

3“ii 

Ancient Thebans 

152 

— 

0*067 

— 

0*067 

0*67 







17*26 

5^=13 


hypothetical population would give x® as great or greater than that 

observed. 

When testing the correlations of Table 8.z by finding the mean j/, 
we implicitly assumed them to be samples from the same popula- 
tion and regarded the variations in r as due to random errors. It 
may be, however, that the variations are real racial differences, and 
that in general there is a correlation between cephalic index and 



METHODS OF STATISTICS 


i8o 

face form, which is sometimes positive and sometimes negative, so 
that the mean is nearly zero; we can now test this, using z' and 
The values of {N ~ 3) are given in the fifth column of the table, 
and their sum gives a of 17*26, which for 13 degrees of freedom 
lies between the o*i and 0*2 levels of significance. Thus the com- 
bined experience of Table 8.2 lends no support to the view that 
the two characters are associated, even after making allowance for 
the possibility of racial dilferences. 

Having found a mean z' from a number of samples by the method 
of section 8.22, and found it to be significant, we may wish to 


TABLE 8.3 

Correlations between Head Length and Breadth 


District 

K'om'ber of 
Cases ^ 
N 

Correlation 

Coefficient 

s ' 

s '-£' 

f 

(s'- ?)*(-V-3) 

1 

■■ 

643 

-5-0 *244 

4-0*249 

—0*017 s 

0*20 

Cairo 

802 

”|“0 *244 

4-0*249 

—0*017 s 

0*25 


127 

4*0*330 

+ 0-343 

4-0*076 2 

0*72 


Sz6 

4 - 0*213 

4-o*2i6 

—0*050 8 

1*35 

Gharbiya . . 

I 104 

4-0*316 

4 ~o * 327 

4-0 *060 2 

3*99 

Minufiya . . 

717 

4-0*230 

-ho *234 

-0*032 8 

0*77 


test if the individual values as a whole differ significantly among 
themselves, and the distribution may be used for this. If ^ is an 
individual transformed correlation, S the mean and iV the size 
of the individual sample, 

and the number of degrees of freedom (g) is one less than the number 
of samples (if the mean has been found from them). As an example, 
we will consider Table 8.3, which contains correlations between 
head length and breadth for Egyptians native to six districts; the 
data have been taken from a paper by Orensteen (1920). The weighted 
mean is + 0*266 8 (giving r = -f- 0*260), and the sum of the last 
column is = for 5 degrees of freedom, this lies almost 

exactly on the P = 0*2 level, and we must conclude that there is no 
evidence of any difference in correlation among the six districts 
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chosen. Comparing the Beheira and Gharbiya districts, we find the 
difference in is o- in with a standard error of 




I lOI 



this difference is about twice the standard error, and if the two 
samples came alone, it might be regarded as significant; but we must 
remember that it is just about the most significant one that could 
be taken from the six values, and bearing in mind the remarks in 
section 3.33 regarding significances of differences within groups of 
samples, we should compare the ratio 0-111/0-053, not with 2*0, 
the 0-05 level for a random pair, but with 2-8, the 0-05 level of 
the range for six samples (Table 3.3.) 


Significances of Regression Constants 

8 . 3 . We have seen in section 8.1 how the analysis of variance 
technique and the and i-tests may be used for testing the 
significance of a correlation coefficient; here we shall extend these 
methods to testing a number of h5rpotheses concerning regression 
coefScients and lines. The hypotheses will be given in italics at the 
head of the sections. 


8 . 31 . Hypothesis: That the population value of the Tegression coefficient 
is zero. 

This is the same as postulating a population with zero correlation, 
and the test of section 8.1 may be used. However, we shall express 
the equation in new terms. 

In the notation of the last chapter, let the regression line of the 
sample be 

(F — j) z= a{x — 5c), 

and the standard deviation of the residual deviations from the line 
be 5, where 

= yV^, 

Then it is easy by means of equations (7.4) and (7.5) to transform 
equation (8.2) to 

a 

t = 

r 

V S{x — x)^ 

where s is estinoated on N — z degrees of freedom. 
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Since t lias been described in section 5.21 as the ratio of a quantity 
to its standard errors we may say that the standard error of the 
regression coefficient in samples from a population with zero asso- 
ciation is 

V S{x — 5 cf 


and the method of section 5.2 may be used to test any significance. 


8 . 32 . Hypothesis: That the population value of the regression co^~ 
dent is a. 

It is easy to show that the above expression for the standard error 
of the regression cc efficient applies for any population value, a, so 
the test of the hypothesis is equivalent to using the i-test to see if 
the ratio of a — ol to its standard error is significant for N — z 
degrees of freedom. 

In large samples where N can be substituted for N — 2, the 
standard error of the regression coefficient reduces to 



a 


5 


P 


where p is the population value of the correlation coefficient. 


8 . 33 . Hypothesis: That a number of samples are from populations 
having the same regression coefficient. The residual variance of y is 
assumed to be constant. 

Consider the data of Table 8.4, taken from a paper by Gknville 
and Reid (1934). They are the crushing strengths of specimens of 
mortar and concrete made from seven different cements; specimens 
were broken at three different ages. These data form part of the 
result of an investigation into the use of tests on mortar as a guide 
to the behaviour of the cement in concrete. If the corresponding 
strengths of mortar and cement are plotted a marked correlation 
will be apparent, and we may ask if the slopes of the regression 
lines of concrete strength on mortar strength for the three ages are 
significantly different. Whether or not the means for the three ages 
are different does not here concern us, nor the possibility of any 
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variations in the residual variance of concrete strength from one age 
to another. 

We may eliminate the mean strengths for the three ages by 
measuring the individual strengths as deviations from those means. 
Then we may either (i) fit one regression line to all the deviations, 
which is equivalent to fitting a series of parallel lines going through 
the means for the various ages, or (z) fit separate regression lines 
for the ages. The data are plotted in Fig. ii, where the parallel 

TABLE 8.4 


Cri^hing Strength of Mortar and Concrete, lb. per sq. in. - i - 10 



Cement 

One 

Day 

Seven Days 

j 

Twenty-eiglit Days ‘ 

Mortar 

X 

Concrete 

y 

Mortar 

X 

Concrete 

y 

Mortar 

X 

Concrete 
y ! 


A 

263 

138 

750 

507 

895 

679 


B 

493 

278 

936 

653 

I 066 

8i8 

f 

C 

137 

49 

453 

42s 

632 

651 

' 

D 

477 

293 

893 

662 

I 100 

842 


E 

233 

104 

545 

437 

716 

603 ; 


F 

568 

3 SO 

797 

735 

897 

832 


G 

230 

I4I 

631 

439 

846 

724 

Ss(x 


164 7S6 

193 074 

173 057 

Ss(y 

— ysY i . . 

76 259 

99 933 

55482 

Ss(x 

— xs)(y - y's) 

III 205 

1 17 648 

82 646 


lines of system (i) are full lines and the separate ones of system (a) 
are dotted. The dotted lines fit the data more closely thap the f ull 
ones in the sense that the sum of squares of deviations of concrete 
strength is less from the dotted lines. The separate regressions are 
only significantly diSFerent, however, if this difference in sums of 
squares is real after allowing for degrees of freedom absorbed in 
fitting and for random errors. This is tested by calculating residual 
variances. If the hypothesis is correct, the lines with separate slopes 
will give the same residual variance as those with the common slope, 
within the limits of sampling errors;* if the second residual variance 
is significantly less than the first, the hypothesis is rejected and it 
is inferred that the separate lines give a closer representation of the 

* This follows from the proof by Fisher (1925) mentioned m section 7.23- 



Crushing Strength of Mortar, 1 000 lb. per sq. in. 

Fig. II 

the residual sums of squares and corresponding degrees of freedom 
after fitting the lines are given by equation (7.92) as: 

r, r O / . >9. — *1) (j — jl)]® ' 

Sums of squares: Sfy — y;)- ^ , _ -^^2 ’ 


- V 2 _ - ^2) (y - 3^3)]^ 


etc., etc., 

Degrees of freedom: Ni — 2, — 2. etc., 


(84) 





freedom are : 


Sum of squares: 

Siiy — + S^iy — + . . . 

_ [^i(.x — ^i) (y — ji) + S^x — ^2) (y — ^2) + • ■ ■¥ ■ (S41) 
SjIx — x^f + S^ix — fa)® + . . . 

Degrees of freedom: iVi + iV2 + i. > 

The degrees of freedom are made up of the N — 1 contributed by 
each sample minus the one absorbed in fitting the common regression. 
The sum of squares divided by the degrees provides another estimate 
of the residual variance of y. 

We cannot test the difference betv^een the two variances obtained 
from (8.4) and {8.41) for significance by the z test, because the 
variances are not independent. However, if we subtract the sum of 
squares of (8.4) from that of (8.41) and divide by the corresponding 
difiFerence in degrees, we have an estimate of the variance that is 
independent of the estimate of (8.4), and may be compared with it. 
The differences in sum of squares and degrees of freedom are : 

Sum of squares: 

[Sj_{x - Xj^ (y - - ^2) (y — JVa)? 

Sj{x — Xj}- S^{x — f g)® 

+ - ■ - • ' ( 8 - 4 ^) 

[-S'i(jg — ^1) (y — ji) + Ss(x — «2 )Ck — jii) + • • -1^ 

Sj{x — fj)® + — fg)® + • • • 

D^ees of freedom: i. 

Then z may be calculated from the ratio of the variances estimated 
from (8.42) and (8.4) and tested for significance for degrees of 
freedom, tti — u — i, = Ni + . . . — zu. 
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For the data of Table 8.4, the total sum of squares of concrete 
strengths from equation (8.4) is* 


(ill 205)- 

76 ^59 - 164 7“8^ 


= 45 471. 


and there are 15 degrees of freedom, giving a variance of 3 031. 
The sum of squares in equation (8.42) is 

(111205)2 . _ (3 1 1 499)" 

164786 ' ■ ■ ■ 530917 

and for 2 degrees of freedom this gives as an estimate of variance 
the value i ^"2^0* This is not greater than the variance from (8,4), 
and the slopes of the lines are not significantly different. 


8.34. Trco samples are from populations having the same regression 
coefficient. 

The above test may be reduced to the t-test when there are only 
two samples. The variance from (8.42) is, in this instance, based 
on one degree of freedom, the square root of its ratio to the variance 

from (8.4) is t. 

If the residual variance estimated from (8^.) is written s® and the 
regression coefficients of the two samples are a-^ and a^^ it is easy to 
show from equations (7.5 )j (^-4) (S- 42 ) that 

t= 

V Sfx — xff S^x — x^^ 

where s is estimated from iVi + -^^2 ~ 4 degrees of freedom. 

This test is parallel to that given in section 5.3 for the difference 
between tv^o means. It follows from this that the standard error of 
the difference between the two regression coefficients may be derived 
from those of the separate coefficients by the standard method, using 
equation (3.1), p. 72. 

As an example we may consider the seven and twenty-eight day 
results of Table 8.4. The residual variance 5^ is obtained from 
equation (8.4) applied to the last two ages, and is 44^6 based on 
10 degrees of freedom. The standard error of the difference between 
the two regression coefficients is 

* All results for this example are in units of 10 lb. per sq. in. 



4426 


17 040 



040 

93 074 


173 

0 4y / 1 

057 

0*6, and fo 

r 10 degrees 

; of 

freedom this 


having the same regression line. The residual variance of y is assumed 
to be constant. 

We are postulating here, not only that the true regression lines 
rate 

true means and variances of the independent variable x need not 
be equal for the populations, so those of y are not necessarily equal, 
but if our hypothesis is true, the true mean value of y and the 
variance for any given value of x must be the same for all samples. 
This hypothesis is not quite the same as postulating that the samples 
come from the same population. Again, we are not considering the 
possibility of the residual variance of y changing from sample to 
sample, and so are not testing it. 

The general procedure is the same as that used in section 8.33, 
We fit the separate lines shown dotted in Fig. ii and estimate a 
residual sum of squares and variance from equation (8.4). Hie 
common line is shown in Fig. ii by a broken line; it is fitted to 
the deviations of the values from the grand means for all samples. 
If X and y are the grand means and S is the summation over all 
individuals, we have for the residuals from this line : 


Sum of squares: S{y — 

Degrees of freedom: N^~\- N^-f 


[S{x —x){y-- y)f 
S{x — xf ’ 
I ... a. 


(843) 


The diBFerence between the sums of squares in (8,43) and (8.4) 
divided by the difference in degrees of freedom, zu — 2, is the 
independent estimate of variance that may be compared with the 
residual variance from (8.4). There is no further algebraic reduction 

of the expressions. 

We shall test the hypothesis on the data of Table 8.4 analysing 
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the variance of the concrete strengths. The sum of squares of 
deviations from the common line is iii 792, the sum of squares 
of deviations from the separate lines is 45 471 and the difference is 
66 321 for 19 — 15 = 4 degrees, giving as an estimate of variance 
the value 16 580. The residual variance based on 15 degrees is 3 031, 
and since this is significantly less than the estimate based on four 
degrees (5r= 0*85 lies beyond the o-oi point), we conclude that 
the regression lines differ. In this instance, since we have shown in 
section 8.33 that the lines do not differ significantly in slope, we 
may say that they differ only in level. 

To test for a difference in level, assuming the slope to be constant, 
it would be legitimate to compare the residual after fitting a number 
of lines having different means but the same slope, with die residual 
after fitting a common line. Such a test is used in section ii.ii. 

Effect of Non-uniformity of Variance 

8 . 36 . The tests for these hypotheses are based on the assumption 
that the residual variances are the same for the populations sampled. 
It is probable that the tests are appreciably affected if the residual 
variances change from one of the populations to another by more 
than a small amount. Where heterogeneity of variance is suspected, 
and a result of importance is very near the level of significance, it is 
well to test this assumption by the methods of sections 5.4 and 5.5. 

For example, the separate residual variances for concrete of the 
three ages in Table 8.4 are 243, 5 649 and 3 203, each being based 
on 5 degrees, and the first is significantly less than the other two. 
In consequence, some little doubt may be thrown on the inferences 
of the sections 8.33 and 8.35. W’’e do not propose attempting to 
extend the statistical analysis to deal with such instances, and if 
the matter is of importance suggest either that more data should 
be procured or that here the hypotheses could be tested on the 
7- and 28-day values only. 

Sheppard’s Corrections 

8 . 4 . In calculating correlation coefficients from samples of grouped 
data, Sheppard’s corrections are often used in determining and 
cTy, and they tend to increase the value of r. For this reason it is 
better to ignore them in making tests of significance, particularly 
when the grouping is broad, although their use probably leads to an 
estimate of r which is nearer the population value. 






The Cohkelation Ratio 

9 . 1 . We have seen that the correlation coefficient measures the 



Fig. 12. 


sensibly from a straight line, the correlation coefficient may be a very 
inadequate representation of the true state of affairs. It is possible 
to imagine a case in which the two regression curves are somewhat 
like those shown in Fig. 12, one being a straight line and the other 
a curve more or less symmetrical about a vertical axis. The correla- 
tion coefficient would be nearly zero (for the best straight line that 
could be drawn to fit the curve would be parallel to the a:-axis), 
although there might be a high association as expressed by the curve; 
we are not aware of any such extreme case, but the same considera- 
tions apply when the curvature is less marked. In such instances, the 
only thing to do is to present the correlation table, to give the array 





tion of one variate, the grand mean and the mean of any individual 
array, the correlation ratio of x on y is r^xyy where 

_ ^ x,f 

^ S{x — xY 


and the correlation ratio of y on x is rj , vrhere . 


( 9 - 1 ) 


2 _ - ysf 

'^yx S(y — yY 


S being the summation over all individuals. If we now refer back 
to Table 6.3, p. 130, we see that 

^ Sum of squares of deviations from array means of x 

Total sum of squares 

and thus tj is analogous to the correlation coefEcient r, since from 
Table 7.61, p. 158, we see that 

Sum of squares of deviations from linear regression 

Tj — rl — — 

^ ^ ^ Total sum of squares 


t\ — 


Hence is the proportion of the total sum of squares of x associated 
withy when a straight line is fitted, and yf is the proportion associated 
when a series of array means or curve is fitted ; just as there are two 
regression lines (for which r happens to be the same), there are two 
series of array means which require separate correlation ratios. If 
the sample is large compared with the number of arrays and we may 
neglect the degrees of freedom absorbed, and if is the standard 
deviation of x^ and that of the residual deviations, we have as an 
approximation, 

for linear regression = (i — 

for curved regression = (i — 


In large samples, therefore, 77 may be interpreted as a measure of 
association on practically the same scale as r. We have, however, 
made the prcmso that the number of arrays shall be small compared 
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with the size of the sample (say one-fiftieth). The usefulness of this 
ratio is increased by the fact that the quantity y does not enter into 
the expression for , so that the second variate need not even be 

quantitative ; we have thus a measure of association which may be 
used in such analyses as those of Tables 6.i and 6.5, pp. 137 and 133. 
From those tables we see that the correlation ratios are : 

Number of ovules per ovary, 


^ 3026-350 

j 

5 379-775 


0*437, '>7 = 0*66, 


Length of cuckoos’ eggs, 


, 2356-21 

ij- = , 

3 429' 70 


0-313. -^ = 0-56, 


and the strength of association in the former case comes somewhere 
between those of the pistils and stamens of the late and early flowers 
(Tables 7.2 and 7.3, pp. 145 and 146), while in the second case it 
is practically the same as the relationship betw'een the pistils and 
stamens for early flowers. 

One disadvantage of the correlation ratio, however, is that it is not 
independent of the number of groups from whidh it is calculated 
— that it is not entirely a property of the population nor even of 
the sample, but depends on its arrangement into arrays. If there are 
m arrays each with n observations (so that the total in the sample 
is N = mn)y if and are the squares of the standard deviation 
of array means and residual deviations in the infinite population, and 
if ©s and are the corresponding variances, we see from Table 6.3, 
p. 130, that 


(m — i)g^ 

m{n — (m — 


. . N' 


(9-2) 




and this shows that 77^ depends on n and m. If the number in each 
array (n) is so large that may be neglected in comparison with 
720 * 5 , and n may be written for (n — i), equation (9.2) may be written 


(9.21) 


I -- 


m — I o. 
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and the larger m is, the smaller is mf{m — i), or the larger is 
Thus it is important to keep the number of groups small so as to 
avoid over-emphasising the association. 

Since tj- is related to a ratio of variances, it may be used instead 
of ^ as a test for significance of association. However, such a test 
is not convenient and is seldom used, and w'e shall not deal with it 
further. 

Tests for Non-Linearity of Regression 

9 . 2 . We shall consider the data of Table 9.1 (Zinn, 1923), relating 
the loaf volume and percentage protein in a number of Ohio com- 
mercial w^heats. The interest lies in predicting the loaf volume from 
the protein content, and so we consider the regression of the former 
on the latter. The correlation coefficient (foimd without Sheppard’s 
corrections) is -f- 0*52978, showing that the loaf volume does, on 
the whole, increase with the protein content. If the table is examined, 
however, there seems to be a tendency for the loaf volume to remain 
practically constant for wheats of low protein content, and for the 
relationship between the two quantities to be more marked for 
wheats with more protein, i.e. for the regression to be non-linear. 
Is this apparent tendency real, or is it due to errors of random 
sampling? This section is concerned with tests of significance of 
such departures from linearity. 

The general method is based on the analysis of variance. If the 
regression is non-linear, it will be almost completely expressed by 
the mean values of loaf volume for the separate arrays, and the 
residual variance calculated hj the methods of section 6.4 will be 
as low as is possible. A straight regression line fitted to the same 
data will have a significantly greater residual variance, because the 
residual will contain variance due to the deviations from linearity. 
On the other hand, if the regression is linear, it may easily be shown 
that these two residual variances will be estimates of the same 
variance. The test for non-linearity therefore resolves itself into a 
test of significance of the difference between the two estimates. 
However, they are not independent, and so cannot be compared 
directly by the jsr-test, but the difference between the sums of squares 
divided by the difference in degrees of freedom is independent of 
the residual variance of deviations from the array means, and these 
two estimates of variance may be compared. 

The sums of squares and degrees of freedom for a sample of 


TABLE 9.1. — ^Protein Content of Wheat and Loaf Volume 
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N observatioES dmded into m arrays, with observations in the 
5th array is given in Table 9.2, where all the symbols have the 
meanings given to them in Chapters VI and VII. All the sums of 
squares given directly are taken from Tables 6.3 (writing y for x 
and adapting it for non-uniform array totals) and 7.6. It is recom- 
mended that the sum of squares in the second row should be obtained 
by subtraction. 


TABLE 9.3 

Analysis of Variance of Loaf Volumes 
(Units, 10 000 C.C.2) 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Mean Variance 

Liaear regres- 


• 


■ 



*■> SlOHl mm mm 

' Deviations from 

Between 

* 

64*66 

> 78*21 

I 

-8 

64*66 

i linear regres- 

arrays 





sion . . 


I3-5S J 


7. 


1*94 

Residual within arrays . . 

152*15 


91 


1*67 

Total 

m 

« ♦ » • 

230*36 


99 


— 


The analysis for the data of Table 9.1 is in Table 9.3. The 
variances to be compared give a value of z of 

and since this is well below the 5 per cent, level of significance, 
there is no evidence that the regression departs from linearity. 

It will be noticed that the distribution of protein content is far 
from normal, but that does not invalidate our test, which depends 
only on the assumption that the distribution of loaf volume within 
the arrays is practically normal. As far as can be judged from a 
sample of this size, this assumption is justified. 

Polynomial Regression Curves 

9 . 3 * Sometimes it is convenient to represent the regression oiy on x 
by a smooth curve described by an equation with several constants 
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that have to be determined from the data. From the statement in 
the footnote in section 7,23 it will be appreciated that such an 
equation having only a few constants may absorb fewer degrees of 
freedom than a number of array means, leaving more degrees for 
the estimation of the residual. "When there are only a few observa- 
tions, they cannot be grouped in arrays and the use of an equation 
may be essential in the analysis of variance. 

The most useful form of equation is the polynomial 

Y = a + bx + cx^ (9.3) 

and according to the method of least squares the constants <2, c, 
etc., may be obtained by solving a series of linear equations 

Sy = aSi + bSx + + - - • ] 

Sxy ~ aSx bSxP^ -f cSx? + . . . i . . (9.4) 

Sx^ = aSx^ + bSx!^ + cSx^ + . . . j 

using as many equations as there are constants. The summations 
are taken over all individuals and may be determined by fairly 
straightforward extensions of the methods of sections 1.3 and 7.5. 
Si is Nj Sx and Sy are N times the respective means and the other 
sums are N times various moments and product moments measured 
about the origin. Any convenient arbitrary origin may be used in 
finding the summations provided the same origin is used in express- 
ing the result in (9.3). If the data are grouped into arrays, the 
array means y^ may be used instead of the individual values, anct 
the summations are then Sn^y^^ Sn^, Sn^x^^ etc., using the sanae 

notation as before. A similar series of equations obtained by inter- 
changing X and y in equations (9.3) and (9.4) gives the regression 
of X on y. 

The polynomial equation is very adaptable, and according to the 
values of the constants a, c, etc., it may take a wide variety of 
forms with only a comparatively few constants. Indeed, for practical 
purposes it is a suitable way of expressing nearly all kinds of regres- 
sion except those in which the curve of y undulates and has more 
than two or three maxima and minima. 

In the analysis of variance given in section 9.2, a polynomial 
equation may be used instead of the array means, and it absorbs 
as many degrees of freedom as there are constants, the constant a 
accounting for the one degree previously attributed to the grand 


i 
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mean of y. Curves of the first, second, third and higher orders imj 
be fitted successively if desired, and as more and more terms are 
used, the equation fits the observations more and more closely, until 
ultimately an equation with as many constants as there are observa- 
tions does so perfectly. This progressive improvement in the closeness 
of fit may be seen by finding the deviations of the observed values 
from those given by the regression and squaring and adding them 
i.e. by finding S(y — Y)K As more and more terms are used this 
sum of squares decreases until when the fit becomes perfect, it 
becomes zero. If the constants are determined by the method of 
least squares and the fluctuations in y are purely random, the reduc- 
tion in sum of squares at each stage tends to be the same; it is the 
variance associated with one degree of freedom, differing from the 
true variance only by the extent of the random errors. If the data 
sho-w a comparatively simple form of variation, predominating over 
the random fluctuations, the earlier terms of the equation, which 
express the simpler movements, reduce the sum of squares by an 
amoimt significantly greater than the later ones. When sufficient terms 
are included to express this slow movement of y with Xy and the 
residual deviations are random, each of the remaining possible terms • 
reduces the remaining sum of squares by the same amount, within 
the limits of random errors. Thus, at each stage in fitting equations 
with more terms, the reduction in residual sum of squares due to 
one degree of freedom can be compared with the variance estimated 
from the residual; if the former is significantly greater than the 
latter, the last term expresses a real feature of the trend. When the 
stage is reached that the variances associated with the last one or 
two terms are not significantly greater than that estimated from the 
residual, it is presumed that the regression has been adequately 
expressed and the deviations are random. This presumption is only 
justified, however, if the polynomial is the form appropriate for 
describing the regression. This technique of fitting and testing curves 
should not be followed blindly, and a certain amount of judgment 
and reference to a diagram may be necessary ; it is possible that the 
first term or two may accoimt for very little of the variance, but 
that some of the later ones may be very important; this would te 
the case in the curve of Fig. 12, where the linear term would be 
very small but the second one would be important. On the other 
hand, if a very large number of terms is reqtiired, it may be because 
the polynomial form is not suitable. A regression equation fitted to 
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any data must not only satisfy these statistical tests^ but it must 
appear appropriate when plotted on a diagram. 

Every time an equation with an additional constant is tried, 
equations (9.4) have to be solved afresh and new values of all the 
constants obtained. Thus, a third order curve requires different 
values of a and b from those appropriate to the second order. The 
equation (9.3) may be expressed in a so-called orthogonal form con- 
sisting of a number of terms of successively higher orders in each 
term being midtiplied by a constant to be determined from the 
data. These terms are independent in that they may be determined 
one at a time, and the addition of the higher orders does not alter 
the constants of the lower orders previously determined. This course 
is easiest when the values of the independent variable are at equal 
intervals and there is one value of the dependent for each value of 
the independent variable, as in a time series. Fisher (1936) describes 
a convenient system for doing this, and we shall illustrate it by 
fitting a curve of the third degree to the protein content data of 
Table 8.1. 


9 . 3 L We shall call the mean protein content (the observed values) 
y, and the year measured from the middle year 1910, i (i.e. for 1908, 
t=- 2), and wiU fit a curve of the form, 


Y = a bt -r cfi dt^ 


to the 29 values {N == 29),* 
This Fisher transforms to 


where 


and 


Y = A + BT^+CT^ + DT 


3 » 


T^=z (t ^ t) = t (since t = o), 




JV2 




12 

3 -^^ — 7 
20 


= 


izS'Si. 




* If there is an even number of years, t = o must still be at the centre, 
so that the values of t must be -j- 0-5, + i -5, . , . — 0*5, •— i -5 . . .. 
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The constants are given by the relations 



B 


^2 rv rr, ^ r<, 

= -rz 7 z^y^> 


N(N^ — i) 


2 030 


c = 


D = 


180 


VAr2 _ I _ A 


^ " (9-7) 


2 800 


N{m - i)(.¥^ - 4)(iV2 - 9) 


3274 

SyT^ 


30 292 704 


125-853;/}. 


The sumraations Sy, Syt, etc., have to be found from the data, and 
Fisher gives a convenient method of evaluating them by a series of 
additions, but for such a small number of observations as we have, 
it is not very much trouble to obtain the summations directly, 
particularly if a calculating machine is available. For the protein 
percentages they are. 


Sy ^ == 411-48 
Sy^t = -f 342-98 
Syjt ^ = 28408-36 
Syit ^= + 49329-62 


Sy ^ = 240 • 78 

Sy 2 t =- — 195-36 

Sy ^ t ^ = 17 674- 16 
Sy ^^ = 26716-68. 


If lyv ayii ayi • • * 29J1 ^he individual readings, 

Sy^t = (29yi--'iyi) . 14 +(28^1—23^1) . 13 + , . . +(i6yi“-i43^i) . I, 

Syif- = Uji+i^i) . i4^+(28yi+23^i) . 13 ^+ • - • +(i^i+i4yi) - 

Syif^ = - H^+(2syi-2yi) - 13 ^+ • • • +( 16 ^ 1 - 14 ^ 1 ) - A 

etc. etc. 


when the values of t are symmetrically placed about the zero; the 

labour is diminished if the terms of the brackets are found first, 

# 

giving two series of fourteen values, one for use when multiplying 
by the odd powers of /, and the other when multiplying by the even 
powers. 

Substituting the above values in (9.7) we obtain. 



411-48 

29 


14-188 97, 



240-78 


8-30276, 


29 




to a straight line law, for 

= r^S(y _ y)\ 

12 

and the others to a mean the movements of which become more 
complicated as more and higher terms in the regression are used. 

We are only concerned with the analysis of sums of squares of 
deviations from the means, and these are given in the “Total^^ row 
of Table 94 for the two series of protein contents, while the 
variances of the terms in the regression are given in higher rows. 
The variances of the second and third terms are both greater than 
the residuals, and we may test their significance by finding 

? = i etc., 

Vo -494/ 
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and seeing if they are above the 0-05 level when % = i and 
or alternatively by using the result of section 6.5 and finding 




T • 


0*494 


5 etc.. 


and testing their significance for 25 degrees of freedom. Fisher’s 
tables of t give 2*060 lying on the 0-05 level of significance, so that 
the first order terms are both real (we arrived at this conclusion in 


TABLE 9.4 
Analysis of Variance 




Series 

I 

Series 

2 

Source of Vaiiation 

Tlp.o~rppi! nf 





Freedom 

Sum of 
Squares 

Variance 

Sum of 
Squares 

Variance ' 

First order regres- 






‘ sion 

Second order re- 

I 

57-95 

57-95 

i8*8o 

i8*8o 

gression 

Third order re- 

I 

I *38 

1*38 

5-93 

5-93 : 

gression 

Residual . . 

I 

25 

6*31 

12*34 

6*31 

0*494 

0*76 

12*06 

0 * 76 
0*482 

' Total 



28 

77-98 

— 

37-55 



section 8.1 when finding the correlation coefficient), and so are the 
third order terms of the first series and the second order terms of 
the second. 

To test whether the cubic regression line as a whole gives a better 
fit than a linear one, we find 


z = 



3-84 

0-494 


for the first series and 


1-025 


for the second series, and the degrees of freedom are = 2, % = 25. 
Fisher gives for such a sample the i per cent, point of as 0*858 5, 


a h b b ^ n 24 

YEAR 

Fig. 13. 

suggesting that they may both be a result of the same random 
causes. We shall use these data later, and assume that cubic equations 
eliminate sufEciently the systematic trend in protein content. 

9-32- There are forms of equation other than the polynomial that 
may be appropriate in some instances ; for example, equations 
involving trigonometric functions are suitable for expressing many 
types of periodic fluctuation with time. For smoothing economic 
time series, many special methods have been devised, and are often 
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favoured either because they describe complicated movements more 
readily than a simple polynomial does or because they are a mathe- 
matical expression of some economic generalisation. We cannot deal 
with these here, but would remark that many of these curves cannot 
be fitted by least squares or some equivalent method, so it is not 
possible to say how many degrees of freedom are absorbed, and the 
criteria of fit developed in this chapter may not be applied. In the 
absence of alternative criteria, this may be a disadvantage, although 
on the other hand it may perhaps be argued that in the absence of 
criteria for testing whether a suitable form of equation has been 
chosen, it is idle to place too much emphasis on criteria for testing 
whether the number of terms in the equation is appropriate. 

9 A It may be well to emphasise the fact that since all the tests of 
the preceding sections are extensions of the analysis of variance, 
they are based on the assumptions, discussed in se cation 6.7, of 
normality and homogeneity of the residual variations. 


Contingency 


9 * 5 . In section 4.4 we have described the formation of contingency 
tables when the two characters can only be expressed in broad 
qualitative groups, and how the existence of association may be tested 
by calculating but we have not given any method of expressing 
the strength of association. The best constant is probably Pearson’s 
(1904) mean square contingency coefiicient. 

Consider the sth row and fth column in a contingency table, let 
the number of individuals in the cell common to both be the 
total in the ^th row be , that in the fth column be and the 
grand total be iV. Then the expected frequency in the cell is 
fisTi JN, and we have seen that 




The mem square contingency is defined as 
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and the mean square contingency coefficient as 

This latter constant was chosen because as the grouping becomes 
very fine (i.e. as the number of cells becomes large), approaches 
the value of the correlation coefficient r, if the population is normal. 
Unfortunately, like the correlation ratio, depends on the form 
and fineness of grouping, but it gives an appreciation of the strength 
after the reality of an association has been established by the 
method of Chapter IV, and does not depend on the order in which 
the groups are arranged. The contingency coefficient is not reliable 
when there are few^er than about sixteen cells, and, as in calcu- 
lating cslls with very few entries should be avoided as far as 
possible ; the constant is thus only suitable for large samples. Subject 
to this precaution, the finer the grouping the better, except that the 
computation becomes very laborious with a very large number of 
cells. For the smallpox data of Table 4.5, 




196-33, JV= I 689, 
so 

^2—0*1162 and Cl = /?- ^ Q.^2. 

\/ 1-1162 ‘ 

It would be instructive to combine groups of Tables 7.i~7.5 in 
various ways so as to form contingency tables with different numbers 
of cells and thus to see how the contingency coefficients approach 
the correlation coefficients. 

The equivalence of the constants of this section to the correlation 
cxjefficient depends on the assumption that the variates are distributed 
normally, but that is no real limitation, for when they can only be 
expressed qualitatively we may reasonably imagine a quantitative 
description on such a scale that its distribution is normal. 


CHAPTER X 


THE FURTHER THEORY OF ERRORS AND PRINCIPLES 
OF EXPERIMENTAL ARRANGEMENT 

Theory of Errors 

10 . 1 . TEe sampling distributions used in the applications of the 
theory of errors in Chapters II to V have all been based on the 
assumption that the individuals in a sample are independent. When, 
however, the variation is heterogeneous and the extent to which the 
sources of variation are represented is not left to chance, the indi- 
viduals are not independent and the simple theory does not apply. 
This condition arises in Table 6.i (p, 127) where each shrub is 
represented equally and the ovaries from one shrub are related. 
The same general theory of statistical inference applies, since it 
does not depend on any assumptions of independence, but the 
sampling distributions and standard errors have to be modified; 
we shall discuss these modifications and some of their consequences 
in the next three sub-sections. 

Random Sampling from Unlimited Field 

10 . 11 . Considering the sample of ovaries of Table 6.1 (p. 127) as 
representing an infinite population of shrubs, let us estimate the 
standard error of the mean. Using the simple theory, we should 
have estimated the standard deviation from the distribution in the 
“totals” column, and from Table 6.2 we see that this gives as an 
estimate of standard error the value 

± = ± °-073 4 - 

\ I oco 

We tnow this to be incorrect, since the ovaries are not independent. 
However, the 10 shrub-means are independent and may be regarded 
as a small random sample from the infinite population of shrub- 
means. From Table 6.2, the variance between shrub-means after 
allowing for weighting is 2*614 92 and the standard error based on 
9 degrees of freedom is 


2*614 92 
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Subject to the limitations of a small sample, this estimate is correct^ 
and is about seven times the incorrect estimate. 

It may be seen how this difference arises when allowance is made 
for the fact that the standard error of the mean is made up of two 
parts, one due to the true variance between shrubs, and the 
other due to the variance within a shrub, If there are m shrubs 
and n ovaries per shrub, the first source of variation is only repre- 
sented bv OT individuals and the second by mn, and therefore we have 


Standard Error of Mean == 



. (lo.i) 


and from equation (6.3), p. 129, we see that the best estimate of 

this is Vvjnm, The simple theory assumes that both sources of 
variation are sampled independently mn times. If we took i 000 
ovaries one at a time from the population of shrubs, leaving it to 
chance from which shrubs they came, this condition would be 
satisfied and w^e should have 


Standard Error of Mean = 



. - (10.2) 


If the estimates of and given in section 6.2 are substituted 
in (10.2) the result differs only slightly from the first estimate of 
the standard error given in this chapter. 


Complex samples of the kind exemplified by the i 000 ovaries, 
with 100 from each shrub, are not uncommon, and standard errors 
may easily be estimated if the sample can be divided into a number 
of equal, independent sub-samples or units like the shrubs. For 
example, the 500 pairs of brothers mentioned in section 3.1 are 
500 independent units to which the sampling theory applies. To 
find the standard error of the mean character of the sample of cotton 
hairs collected in tufts as described in section 3.1, it would be 
necessary to measure the mean character separately for each of the 
reduced tufts, say 64 in number, and then calculate the standard 
error of the grand mean of these 64 tuft means. Sometimes, the 
sub-samples themselves are not independent, but have to be grouped 
into yet larger units, and there may be a whole hierarchy of units, 
only the largest of which are independent. Thus, eight tufts of 100 
cotton hairs may be taken from each of eight bales to sample a 
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consignment of cotton, and since only the bales are independent, 
we must reduce our 6 400 hairs to a small sample of eight bale means. 

The same considerations apply to means of observations in a 
time (or space) series, say to the mean annual rainfall obtained from 
the records for a number of years. The total variations are due to 
residual deviations and to a gradual trend, which may perhaps be 
expressible by a polynomial equation with a few constants ; the trend 
corresponds to the shrub or tuft variations in the above examples, 
and is not sampled as many times as there are years. Indeed, it is 
difficult to say how many times it is sampled, and exactly what 
contribution, it makes to the standard error of the mean, and we 
can only be sure that if the secular trend is at all important, the 

simple standard error given by oj\^ is a serious underestimate 
of the true one. 

The situation may be described in another way by stating that 
wffien the variation is heterogeneous, and the individuals are not 
independent, the variations between the individuals in a sample are 
not of the same kind as those between samples. Consequently, the 
simple sampling theory which estimates the variance between 
samples from that within a sample, cannot apply. This general 
situation may arise in several ways. For example, a laboratory that 
conducts routine determinations of the chemical or biological 
qualities of materials may attempt to estimate its errors by doing 
replicate tests. If the replicates are always done in parallel, going 
through the various preparations and treatments together and on 
the same day, they are not subject to the same errors as replicates 
done on different days and subjected to processes of determination 
that are independent from the beginning. Consequently, standard 
errors calculated from the parallel replicates are not valid estimates 
of the error in the comparisons of independent determinations made 
on different materials on different days. It is essential to ensure 
that the replicates on one material should be subject to the same 
variations as the determinations of the different materials to be 
compared. It may be suggested here that where much routine 
sampling or testing of a particular kind is done, replicate independent 
samples or tests of a number of populations or materials may provide 
a good combined estimate of the error of that kind of determination. 
For example, the replicates may be duplicate determinations, and 
the variance may be estimated from the equation for pairs in 
section 5.1. Before doing this, however, it is well to conduct an 
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mvestigation to see that the estimates of variance provided by the 
separate sets of replicates do not differ among themselves by an 
amount greater (or less) than may be attributed to random errors. 
This may be done by Neyman and Pearson’s test, section 5.5, if 
there are only a few sets of replicates; if there are many sets, an 
approximate test is provided by seeing that the variance of the 
separate estimates of variance is not signihcantly greater than zv^N 
(section 3.55), where v is the mean variance for all sets and N is 
the number of replicates per set. Another line of investigation is to 
test for the existence of a correlation between the variance and mean 
quality of each set. 

Economy in Sampling 

10 . 12 . When the variation in a population is heterogeneous, the 
arrangement may have an influence on the precision attainable with 
a given size of sample. We shall discuss here the best way of dis- 
tributing observations between and within groups when there are 
those two simple sources of variation. 

If we use the notation of the previous section and let the total 
number of observations, mw, be Ny the standard error of the mean 
given by equation lo.i reduces to 

For a given number of individuals, iV, this is least when n is smallest, 
i.e. when n = i. That is to say, unless there are technical objections, 
for a constant number of observations the maximum accuracy in the 
mean is obtained when each one is selected separately and indi- 
vidually from a separate group ; and in that case the standard error 
found in the ordinary way from the total variance is applicable. The 
reader will agree that this conclusion is merely common sense. 

However, technical difficulties often intervene; for instance, we 
have mentioned in section 4.1 that in selecting cotton hairs indi- 
vidually there is a tendency to prefer the long ones, and to overcome 
this, hairs have to be taken in tufts. 

Another modification is necessary when it takes more time to 
increase the number of groups than to increase the number of 
observations within a group. Smith and Prentice (1929), in obtaining 
soil ^mples for counts of cysts, took a number of “borings” of soil, 
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and then made several counts on each boring; if there were no other 
considerationsj it would be best to take many borings and to make 
one count on each, but we must take account of the fact that it may 
take more time to obtain a boring than to make a count on one. If 
there are n counts on each of m borings, ^ is the variance between 
borings in the infinite population, that of counts within a boring, 
t the time to make a count and M the time to make a boring, we have 
for the total time to obtain the N — mn counts, 


T = mnt mkt; . . . . 

and the square of the standard error of the grand mean, 




2 


^ , 

— T 

m mn 


(io-3) 


(10.31) 


Now from (10.3), 


frifi = — ^ mky 

t 


and substituting this in (10.31) we obtain 



In order to find the best way of distributing a constant time T 
between taking borings and making counts, we must find the value 
of m which makes gm ^ minimum. Performing this operation in the 
usual way by making 

'bm 

and substituting, we obtain 

^ 2 

n2 = k— 2 (10-3^^) 

0 's 

Smith and Prentice foimd estimates for and cr^ to be 40*5 and 
23*1 per cent, of the mean, and if we assume it takes five times as 
long to take a boring as to make a count, we find from (10.32) that 
the time is best employed when n= 1*3; i.e. under such circum- 
stances it would be better to increase the number of borings than to 
take more than two counts on each one. Examples of this sort arise 
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in other connections, and equation (10.32), which is only intended 
to give a rough guide, may be found useful. Similar considerations 
also arise in estimating other constants. 

Sampling from Limited Field 

10 . 13 . In considering the sampling theory connected with the data 
of Table 6.1, wx have regarded the ten shrubs as a random sample 
from an infinite population of shrubs. Sometimes, however, there 
may be only ten shrubs in the population under investigation, 
i.e. the field of sampling may be limited, although an infinite popu- 
lation of ovaries may be postulated. Then there are two possible 
methods of sampling. 

In the first, the method of unrestricted random sampling, ovaries 
are taken entirely at random without any regard being paid to the 
shrub from which they come. Then, in a finite sample, the numbers 
of ovaries from the several shrubs are not necessarily the same and 
in general shrub variations contribute something to the sampling 
errors. The “totals’^ column of Table 6.1 represents this population 
of indiscriniinately mixed ovaries from the ten shrubs, and the 
variance estimated from this provides the standard deviation from 
which the standard error of the mean has already been estimated 
as ± 0-073 4- The equation for this standard error is equation (10.2), 
except that when o-^^is obtained from the sample as in section 6.2 
it should be multiplied by (m — i)[m; if the number of shrubs is 
limited, is not estimated, it is determined. In tie subsequent 
discussion we shall neglect this correction to equation (10.2). 

The second method is termed by Bowley (1926) selection in strata 
and it consists in dividing the population into a number of parts or 
strata and in t akin g a sample of individuals at random except that 
the number taken from each stratum is proportional to the number 
present. In our example the ten shrubs are the strata and they may 
be presumed to be approximately equal in size. Then since each 
shrub is equally represented in the sample, as far as shrub variations 
are concerned the sample is an exact representation and sampling 
errors are due entirely to within-shrub variations. The within-shrub 
variance is 3*057, so the standard error of the mean of N ovaries 
sampled in strata is 

or generally, - 


o 
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In the language of section 3 .7, the method of selection by strata is 

100 — per cent. 

more efBcient than purely random selection ; for the data of Table 6.1 
this difference in efhciency is 76 per cent. 

The method of sampling by strata has a wide application since 
most populations are limited in extent, even though they may have 
so many individuals that the conception of an infinite population is 
reasonable. For example, an agricultural plot can be divided into 
areas for sampling ears of com, say; a town can be sampled in 
wards and streets in a sociological survey as suggested in section 3.1 ; 
or the products of a factory can be divided into strata representing 
the output of different machines, operatives or periods of time. The 
degree of superiority of sampling by strata to simple random 
sampling depends on the variance between — compared with that 
within — ^the strata, and there is an art in choosing the lines of 
division so as to make the ratio as large as possible. A knowledge 
of the sources of variation and of their relative importance such as 
that given by an investigation using the analysis of variance and by 
a priori technical knowledge is of great assistance. When there are 
no real variations between the strata, i.e. when 0^^ = o, sampling 
by strata is at its worst, and even then it is as good as random 
sampling. It should be noted that unless there are at least two 
individuals from each stratum, a,. and hence the precision of the 
sample, cannot be estimated. 

The Peinciples of Experimental Arrangement 

10 . 2 . All the foregoing considerations apply to the determination 
of a mean; but quite often we are interested, not in the absolute 
value of the mean, but in the extent to which the mean changes with 
some change in conditions (varied, perhaps, experimentally). From 
the point of view of such experiments, the variations are disturbing 
factors which the observer has to average out by taking large numbers 
of observations. If the variability is heterogeneous, however, it is 
often possible so to arrange the experiment that the group differences 
affect all the conditions or treatments equally, so that only the smaller 
residual variations contribute to the errors in the comparisons, giving 
an increased accuracy for the same number of observations. This is 
the principle on which experiments should be designed. 
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Harris, in the paper from which the data for Table 6.i were 
taken, says that there were three series of ovaries : 

A, ovaries from opened flowers which were shaken from the shrub ; 

B, ovaries from opened flowers which w^ere apparently continuing 

their development at the time of sampling; and 

C, ovaries w’hich had completed their development to mature 

fruit. 

He wished to discover if series C was different from series B 
and (more particularly) from A, and compared among other things 
the mean numbers of ovules per ovary."^ Now if each series had 
been taken at random from different lots of shrubs, the standard 
error of each mean would have been calculated from the variance 
between trees, and would have been ± 0*511 4, as calculated in 
section lo.i i. Actually, however, each series was taken from the same 
shrubs, and by comparing corresponding means, the shrub variation 
was eliminated. For the puurpose of such a comparison, the standard 
error of the grand mean is obtained from the smaller intra-shrub 
variance, and is that given in section 10.13 for selection by strata, 
viz. 

= ± 0-055 29, 

\ I 000 

or about one-ninth of the standard error for a random collection of 
ovaries from two series of independent trees. This increase of pre- 
cision may be expressed by the fact that to obtain the same accuracy, 
the number of ovaries from two independent series of shrubs would 
have to bear to the number for the restricted arrangement the ratio 

(0*05529)2 

or about 86:1 (assuming 100 ovaries to be taken from each shrub). 

A simple instance of the advantage of arranging the experiment 
or collection of data so that large variations are eliminated is that of 
comparing the means of two parallel series of observations, and as 
an example we will compare the mean head breadths of termites from 

* Actually, Harris compared the ovules per loculus, and we have taken the 
ovary as the unit because its distribution showed the heterogeneity of the 
variance better. However, Hams only dealt with ovaries with three loculi, 
so that it does not affect our argument. 
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nests 668 and 670 (Table 6.8, p. 137)- First, ignoring the fact that 
conysponding pairs of breadths are from the same month by treating 
the 10 observations as though they were independent, we find 
according to section 5.3, 



0-191 6 

0*075 9 0-632 


== 4 * 00 . 


Now, taking the five differences between corresponding pairs from 
the same month, we use the method of section 5.2 and find 


t 



0-191 6 X 2-236 
0-08109 


5-28. 


This increased value of t means that by comparing samples taken in 
parallel months, the significance has been increased. 

This fact may be expressed algebraically for the simple case; if 
there are two series of quantities, % and both measured directly 
as deviations from their grand means, the sum of the squares of their 
differences is 

S{x'^ — ~ Sx'^ + — zSx^x'^, 


When we wrote a similar equation in section 6.1, we assumed the 
variates to be independent and the product term to be zero, but if 
the variates are not independent, we see from equation (7.4), p. 165, 
that 

SxX = 

Substituting in the above equation, dividing by (N — i) and writing 
a with an appropriate suffix for the standard deviation, we find 

From this, putting r = o we obtain the formula for the standard error 
of a difference between two independent means; but in pairs of 
parallel series like the head breadths of termites, r is positive and 
real and the standard error of the difference is thus reduced. If on 
the other hand r were negative, the standard error of the difference 
would be increased. 
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The Further Analysis of Variance 

10 . 3 . The whole question of the arrangement and interpretation of 
experiments is closely bound up with the analysis of variance, as 
’we have shown in section 6.6 Usually there are more than two 
superimposed factors and the analysis may be complex. For example, 
when considering the variations in head breadths of termites of 
Table 6.8, we did two simple analyses into variations between and 
within nests, and between and within months. We see, however, 
that the nest, and month variations are superimposed, and one 
analysis should separate the variations due to nests, months and 
residual random causes. Such an analysis will be developed in this 
section. 

Since there is a real variation between nests, and since each month 
is represented equally in each nest, 'we may eliminate the nest 
variations by expressing the breadths as deviations from their nes:t 
means as in Table lo.i. We may now analyse the variance of these 
deviations into two parts, between and within months ; this is done 
in Table 10.2. The ^‘total” sum of squares is, of course, the “within 
a nest’’ sum of Table 6.9, and arises from 20 degrees of freedom. 
The “between months” sum is the same as before (the elimination 
of the nest means has made no difference to the differences betw^een 
the months), and the degrees of freedom are still 4; the final residual, 
with 16 degrees of freedom, is thus very small. We now test for 
association with the months by finding the z for the variances; it is 
I ■ 16, and for % = 4, ng = 16 lies well beyond the o-oi point. Thus, 
by taking account of and elimiaating the nest variations, we have 
established the month of year effect which previously appeared to 
be of doubtful significance. 

There is no reason why we should not start with the deviations 
from the monthly means, and analyse their variance into two parts, 
between and within nests. Such a process would give the same 
between-nest variance as Table 6.9, and the same residual within 
a nest as is given in Table 10.2 for within a month. This residual is 
common, and represents random deviations that cannot be eliminated. 
The significance of the nest variance then becomes very much greater 
even than before, for 

z = l [log, 0 • 034 36 — log, o • 002 31] = I • 35, 

^d ^1 = 4 and ^2 = ^6. We have thus analysed the total variations 
into tiree parts, between nests, between months and a residual; 
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each of these can be given a place in a larger table of analysis with 
four rows. 

The algebraic relations may be expressed by the equation, 

{x — - x) — (x^ — x) + (x^ ^ x) -i- d, . . (104) 

where x is an individual reading for the ^h row and the fth 

column,* 

x^ is the mean for the sth row, 

Xi is the mean for the rth column, 

X is the grand mean, and 

d~x — ^ is the residual deviation. 

Then squaring, and summing for all rows {S) and columns (j^, we 
have 

SS{x - xf = SS{x, - xf + SS{x, - 4. SS(P^ 

since the sums of the product terms come to zero. These are the 
sums of squares that go into the analysis table. If there are rows 
and Its columns (i.e. observations in each row and % in each column, 
total N = n^ni), the above equation becomes 

SS{x — xf = n^S{Xs — xf + niS{Xi — xf -f SS(P . (10.5) 

Expressed in words, this equation states that the sum of squares 
of deviations of individual observations from the grand mean equals 
the sum of squares of deviations of row means multiplied by the 
number of readings in each row, plus the sum of squares of deviations 
of colunm means multiplied by the number of readings in each 
colunm, plus the sum of squares of residual deviations. 

These are entered in Table 10.3, and symbols are written for the 
variances. If o-^^, af and are the squares of standard deviations 
of row, column and residual variations in the infinite population, 
we obtain as in equations (6.3) 

^ ^ + CTr ' 

Vi — ► .... (10.6) 

After the significances of the sources of variation have been estab- 
ished by the z test carried out on the values of Vj it is advisable 

* The conception is generalised by writing row for month and column 
for nest. 
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to calcolate these estimates of the “corrected variances” as we 
may call them, as measures of the relative importance of the different 
sources. The values of ^ depend on the number of observations 
per row or column, i.e. on arbitrary features of the sample and so 
are of no value in themselves for measuring the objective properties 
of the population. If a large number of observations were taken 
individually at random from a .population comprised of many rows 
and columns, of which the few rows and columns treated in the 
analysis are a random sample, the variance between these individuals^ 
would tend to equal the value + cr/ -f- jf factor were 
eliminated as a source of variation, this variance would be reduced 


TABLE 10.3 
Analysis of Variance 


Source of Variations 

Sum of Squares 

Degrees of Freedom 

Variance 

Rows 

nsS(xs ~ xy 

s 

fit — J 


Columns . . 

mS^xt — 

i 

fls — 1 

Vt 

Residual . . 

SSd^ 

s t 

N — fis — fit I 

Vr 

Total . . 

ss(x - xy 

s t 

1 

H 

1 

— 


by the amount of the corresponding variance. For the head breadth 
of termites, 

o • ooz 3 1 (residual), 

0*023 51 — 0*002 31 , , ^ 

o-s- ^ = 0*004 24 (months), 

5 

_9 . 0-03436 - 0-00231 ^ ^ ^ 

o'i o • 006 41 (nests) ; 

5 

and the variations due to months and nests are seen to be com- 
paratively important. Sometimes, however, when there are many 
rows or columns, an unimportant variation may be highly significant. 
To interpret these variances in terms of normal frequencies, reference 
may be made to section 2.52. 

The estimates of <7^- and <7^^ are subject to errors of random 
sampling, but since these corrected variances are differences between 


PRINCIPLES OF EXPERIMENTAL ARRANGEMENT 

two independent estimates of variance, their sampling distributions 
are composite. We shall make no attempt to determine them here, 
since all the tests of significance that are usually necessary can be 
performed on the values of v. 

In computing the residual sum of squares for Table 10,3, it is 
troublesome to find the actual residual deviations, as we have 
done above, and provided the arithmetic is carefully checked, it is 
sufficient to find the sums of squares for the rows, columns and 
total, and to obtain the residual by subtraction. The procedure is 
facilitated by extending the methods of section 6.3, performing the 
following operations; 

(i) Sum the squares of the individual observations, 

(z) Sum the squares of the row totals and divide the sum by the 
number of individuals in each row, 

(3) Sum the squares of the column totals and divide the sum by 

the number of individuals in each column, 

(4) Square the grand total and divide by the grand total number 

of observations. 

Then it may easily be shown that the sums of squares required for 
Table 10.3 are: 

For rows . - the result of operation (2) minus that of (4), 

For columns . . (3) — (4), 

For the total . , (i) — (4). 

The observations may all be measured from some arbitrary origin, 
e.g, 3*200 mm. for the head breadths of termites, so that the quan- 
tities to be squared have only three or four significant figures, and 
the computations may all be performed easily with the aid only of 
a table of squares. 

10 . 31 . A statistical study of the variations in a population is useful 
for establishing a sampling scheme, as suggested in section 10,13, 
and in industrial practice for suggesting in what directions it would 
be profitable to attempt to control quality. It is usually possible to 
collect a sample in which the sources of variation are represented 
in some balanced way as in Table 6.8 and to carry out an analysis 
of variance. This represents the type of control by selection and 
arrangement referred to in the Introduction to this book. Analyses 
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2lS 

carried oHt in this connection may be very complicated, and it wilt 
be instmctive to deal with the following example in detail. 

The data in Table 10.4 are from a paper by Gould and Hampton 
(1936).* From a single about eighteen cylinders of spectacle 

glass are made on any one day or “journey.” Two pots are heated 
together in the same furnace and glass may be made on several 
consecutive days from the same pair of pots, forming a “nm.” The 
quality measured, or variate, is the mean number of seed per unit 
area of glass, and this is given in Table 10.4 for three cylinders 
from each pot — ^the third, tenth and sixteenth in order of manu- 
facture — and for the first five journeys in each of four nms. There 
is no significance in the numbering of the pots, and the runs are 
independent, being separated by several weeks and sometimes 
referring to different furnaces. The separate figures are given in 
the table, and various totalsf have also been computed. 

Now each kind of manufacturing unit — cylinder, pot, journey or 
run — is a potential source of variation, since the causes that control 
quality may be consistently different for each unit. Further, since 
the journeys are consecutive days and the cylinders are in order of 
* manufacture, there may be trends in quality. Clearly, the variations 
in density of seed shown in Table 10.4 are the result of the super- 
imposition of variations from many possible sources. Precisely what 
are these sources ? Which have statistically significant effects ? What 
is their relative importance? To answer these questions we shall 
analyse the variations in Table 10.4 completely. ' 

It would be possible to do this analysis in one step writing down 
equations analogous to equations (10.4) and (10.5), but the notation 
would be complicated and the process would be very difficult to 
follow. We shall break Table 10.4 up into a number of simpler 
tables of kinds that have already been dealt with, and analyse these 
step by step. Readers are warned that they will find this analysis 
very difficult to follow imless they have “at their finger-tips” the 
simpler analyses dealt with so far. 

First let us consider the fifteen readings in one pot-run, say the 
first pot and first run. This is statistically similar to Table 6.8 

m From Table IV of their paper. We give here only four of the five runs 
given there, so as to make the analysis easier to follow. It might lead to 
some ambiguity in the argument if five runs and five journeys were retained. 

t These totals are not total frequencies as given, say, in Table 6.1. They 
are the sums of the qualities of the individuals included in the totals. 


V 
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TABLE 10.4 
* 

Mean Number of Seed per Unit Area 


I 

' * 


Pot I 

Pot 3 


Totals 



Jouniey 


CyKnder 






-KUIl 



1 


Both 

Pots 



3 

10 

16 

3 

10 

16 

Pot I 

Pot 2 


I 

47 

56 

100 

s 52 

61 

88 

203 

201 

404 


2 

55 

89 

93 

j 49 

62 

97 

237 

208 

445 

I 

3 

35 

57 

56 

i 34 

60 

72 

148 

166 

314 


4 

78 

67 

113 

' 47 

93 

118 

258 

258 

516 

- 

5 

33 

40 

128 

< 16 

i 

r 

29 

130 

201 

175 

376 


Totals 

248 

309 

490 

198 

305 

505 

1 047 ; 

, 1 

1 i 

I 008 

Wm 

B 


I I 

52 

66 

36 

. 65 

i 

t 

i 

So 

40 

1 154 

185 

339 


2 

21 

61 

49 

122 

97 

79 

i 131 

298 

429 

2 

3 

31 

39 

35 

45 

1 

72 

1 95 

I7I 

266 


4 

43 

73 

53 

109 

i 120 

80 

167 

309 

476 


5 

37 

51 

67 

67 

> 8s 

63 

' 155 

315 

370 


Totals 

184 

289 

229 

408 

' 436 

334 

1 

703 

I 178 

1 880 


I 

50 

61 

60 

75 

i 139 

130 

171 

344 

515 


2 

33 

27 

49 

46 

- 58 

63 

109 

167 

276 

3 

3 

24 

39 

24 

' 15 

33 

39 

87 

87 

174 


4 

18 

18 

43 

22 

16 

19 

- 79 

57 

136 


5 

28 

42 

28 

37 

19 

22 

98 

68 



Totals 

153 

187 

204 

185 

265 

273 

544 

723 

1 267 


I 

24 

34 

43 

46 

66 

24 

lOI 

136 

237 


2 

24 

49 

42 

^ 40 

117 

105 

115 

262 

377 

4 

3 

21 

21 

51 

30 

28 

34 

93 

93 

185 


4 

21 

69 

48 

36 

64 

53 

138 

153 

291 


5 

76 

48 

42 

39 

60 

78 

166 

177 

343 


Totals 

166 

221 


191 

335 

294 

613 

820 

1433 
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TABLE 10.5 
Analysis of Variance 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

t 

{d) Within Pots 
(i) Between cylinders . . 

32 133- S3 

16 

I 383*28** 

^ (2) Bet^^^een journeys . . 

37 916*66 

32 , 

I 184-90** 

f (3) Residual within pots 

18 398*14 

64 

387-47 


(4) Total .. .. 78447*33 II3 


(b) Between Pots 


(5) Between runs 

(6) Residual between 

13 679*90 

3 

4559-96 

jpots mm » • 

10 099*57 

4 

2 524*89** 

(7) Total 

23 779-47 

7 

3 397*07** 


(c) Between Cylinders 


(.8) Common to all runs 
(9) Common to both 

9 132*88 

3 

4566-44 

pots in run less (8) 

II 532*72 

6 

I 922*12 


20 665*60 

8 

2 5^3 *20** 

(10) Specific to pot 

I 466*93 

8 

183-37 

(ii) Total 

22 132-53 

16 

1 

(d) Between Journeys 



1 

(13) Common to all nms 
(13) Common to both 

9 684*00 

4 

2 421*00 

pots in run, less 
(12) 

18 650*07 

12 

I 554*17 


28 334-07 

16 

I 770*88** 

(14) Specific to pot 

9582-59 

16 

598*91* 

(15) Total 

37 916*66 

32 

— 
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(p. 137), and the variation may be divided into three parts: between 
cylinders with two degrees of freedom, between journeys with four 
degrees, residual with eight degrees, and there are fourteen degrees 
for the total. This process may be carried out for the eight pot- 
runs, and the sums of squares and degrees of freedom added, giving 
the data in the upper part of Table 10.5. 

Now consider the pot means, - ^ | etc. There are 
eight readings giving seven degrees of freedom and the sums of 
squares may be divided into twm parts: between runs with three 
degrees and a residual within runs with four degrees. To obtain 
sums of squares comparable with those within pots, we should 


TABLE 10.41 


1 


Cylinder 

- 


3 

10 

16 

1 

Pot I 

49*6 

6i«8 

' ' ' I. . ^ 

98*0 1 

Pot 2 

4 

39*6 

6i*o 

101*0 


multiply the sums of squares obtained from the pot means by the 
number of readings in a pot, 15, just as in equation (10.5) we 
multiplied the squares of the row and column deviations by the 
number in a row, or column, 71 ^, This is equivalent to stating 
that the squared deviations should be summed over all the original 
individuals. When this process is carried out, the results in section (6) 
of Table 10.5 are obtained. 

The cylinder variations given in row (i) may be analysed further. 
The cylinder means for run i are in Table 10.41, This is like 
Table 6.8, and the variance may be analysed as in Table 10.3 into 
parts associated with pots (one degree), a cylinder variation common 
to both pots (two degrees) and a residual (two degrees) which 
represents an extra cylinder effect that is specific to eadi pot and 
is due to the deviations of the separate cylinder readings for each 
pot from the cylinder readings common to both pots. Similar 
analyses may be carried out for the four runs, and the sums of 
squares and degrees of freedom added. When this is done and the 
resulting sums of squares are multiplied by five, to make them 
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comparable with section (a) of Table 10.5, the results are as shown 
in rows (6), (8) and (9) together, and (10) of that table. It would 
be just as reasonable, statistically, to describe the residual in 
row (10) as an extra pot effect that is specific to each cylinder, and 
to show impartiality, we may call the source of variation an inter- 
action between cylinders and pots. From a practical point of view, 
however, it seems more appropriate in this instance to concentrate 
attention on the cylinder variations. 

Yet another subdivision of the cylinder effect common to both 
pots in a run (rows 8 and 9) is possible. The cylinder means common 
to both pots in each of the four runs are in Table 10.42. The total 
sum of squares between these means has ii degrees of freedom 


TABLE 10.42 


1 



Cylinder 



3 

10 

16 

Run I . . 

• «k 

44-6 

61 *4 

99*5 

Run 2 . . 

• • 

59*2 

72 '5 

56-3 

Run 3 . . 

• • 

33-8 

45*2 

47-7 

Run 4 . . 

• • 

35-7 

55-6 

52*0 


and may be divided into parts associated with runs (3 degrees), 
cylinders common to all runs (2 degrees) and a residual (6 degrees) 
representing a differential cylinder trend for each run but conunon 
to the two pots in the run, i.e. a cylinder-run interaction. The sums 
of squares of these items multiplied by 10 and the degrees of freedom 
are entered in rows (5), (8) and (9) of Table 10.5. 

In a precisely similar way the joxumey-variations of row (2), 
Table 10.5, may be further analysed, into the parts shown in 
section (d) of that table. 

Before testing the variances in Table 10.5 for significance, it will 
be well to determine of what Corrected variances they are estimates. 
Let the variances entered in the table be denoted by v with a 
subscript corresponding to the row of the table, i.e. etc., 

and let the corresponding corrected variances be denoted by with 
a corresponding subscript. Thus, is the variance that would be 
found between the cylinder means within each pot if an infinite 
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number of observations could be made on each cylinder and there 
were an infinite number of pots. Further, let the number of the 
original individuals contributing to the mean of each individual 
factor be n with an appropriate subscript. For example, % is the 
number of readings per cylinder within each pot, = 5, is the 
number of readings per cylinder mean for each run, = 10, and 
so on. Then the values of the n’s and the relations between the 
observed and corrected variances, expressed in the manner of 
equations (10.6), are as follow^s: 


«i = 5 

Vi -> + ^^2 

II 

”2*^2" "I" °' 3 ^ 

«3= 1 


W5= 30 

©5 -»■ + t’s 

”6= 15 

ZJg -5. +£73^ 

II 

®8 

ng — 10 

Vg ng<ji + ©10 ^ 

n^Q = 7Zi = 5 

^10 ”10*^10^ “I” 

II 

07 

®12 ” X 2 °' 12 ^ 

II 

o^ 

®13 

K 

II 

II 

©14 Mi4<7i4^ -{- dg 


We shall now proceed to discuss the derivation of these expressions. 

The relations between v^, and ^3 are the same as those between 
the variances of Table 10.3, except that we have combined the 
estimates of several tables. It will be noted that the cylinder and 
journey effects may vary from one pot or run to another and <t| and 
of may themselves be complex. The residual variance or| is due 
partly to the fact that only a limited number of sections on each 
cylinder were examined for seed and partly to a real variation that 
could not be associated with any factor. 

The relation between and ©g follows from equation (10.6), and 
from the fact that after performing the analysis on the pot means, 
we multiplied the sums of squares by the number of individuals 
per pot. Thus, when the analysis is performed on the pot means, 
the of equation (10.6) is the number of pots per run = z; but 
we multiplied, the sums of squares by the number of readings per 
pot, = 15, so in the above equation, a| is multiplied by the product 
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of these -which is n^. The residual variance measured between the 
pot means is the s ntn of two parts ^ it is the coirected variance 
between pots, plus the scjuare of the standard error or variance 
due to the fact that the residual variations within a pot are only 
sampled by Wg 1 5 individuals ^ this added variance is cig/r 5 * The 
variance ©g is Wg times this measured variance between pot means. 
The cylinder and journey effects do not contribute to the error in 
a pot mean, since they are expressed as deviations from the pot 

mean. 

The relations between tig and Cg, between tig and tijg, between 
©12 and ©13 and between ©13 and ©14 may easily be deduced in a 
similar mann er; these variances have all been found from analyses 
like that in Table 10.3. The apparent variance ©i^ is contributed 
to only by the part of the cylinder trend that is specific to the pot, 
^2 and. by the first residual, c^, and the relation given above is 
easily deduced. Similar remarks apply to the journey eliect. 

We are now in a position to test the variances given in Table 10.5 
for statistical significance, using Fisher’s sr-test. If there is no 
cylinder effect, af = o and the test for this effect consists 

in establishing the statistical significance of the difference between 

and zjg. Similarly, should be compared with ^’3, ^^5 with ©6> 
^6 with £J3, with and so on. When these tests are carried out, 
the values of z corresponding to the variances marked with two 
asterisks are above the i per cent, point and that with one asterisk 
is between the 5 and i per cent, points; we shall presume that 
these show the existence of significant effects ;* all the other values 
of z are below the 5 per cent, point. 

The variances and t>oth significantly greater than 

if this had not been so, we should still have been justified in per- 
forming the further analyses of sections (c) and {d) of Table 10.^, 

The variance is not significantly greater than ©g, so it is 
reasonable to combine these to give an improved estimate of the 
residual variance {h) between pots, based on 7 degrees of freedom; 
this is in row (7) of Table 10.5. Similarly, is not significantly 
greater than Vq and it is reasonable to work on the conclusion that 
the cylinder effect varies from nm to run and is measured by the 
combined vmance based on 8 degrees. This combined estimate 
of is significantly greater than We see, however, that is 

* The one variance with one asterisk has a value of z very little below 
the I per cent, point. 
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not greater than ©3, conclude that the cylinder effect does not 
vary from pot to pot within a run. Similar conclusions may be 
reached regarding the journey variances, except that is greater 
than ®3. We are justified in combining rows (10) and (3) to obtain 
an improved estimate of ©3 based on 72 degrees of freedom; it 
is 275-90. 

The significant effects and their corrected variances estimated 
from the apparent variances of Table 10.5 may be summarised as 
follows : 


Differences between pots, 

2 524*89 ~ 275*90 


^6 


15 


149*93 


Cylinder effect varying from run to run, 

2 583*20 — 275*90 




9 


10 


= 230-73 


Journey effect, part due to variation from nm to run, 



I 770*88 — 598*91 

6 


i9S‘33 


Journey trend, part due to variation from pot to pot within a run. 

598-91 — 275*90 


r2 

14 


= 107*67 


Residual (unaccounted variation), 

275*90. 

These estimates of corrected variance measure the relative impor- 
tance of the several factors in the way described in section 10.3. 
The estimates are valid, however, only in so far as the third, tenth 
and sixteeenth cylinders are representative of the eighteen or so 
that may be made from a pot, and the first five journeys are repre- 
sentative of all the journeys possible in a run, in so far as there is 
any secular variation. Fcfr example, all the cylinders between the 
third and tenth could have fewer and those between the tenth and 
sixteenth could have more seed than the three measured, and this 
would be a source of variation entirely overlooked by our analysis, 
of which random errors take no account. We shall assume the repre- 
sentativeness of the cylinders and journeys. 

p 



226 


METHODS OF STATISTICS 


All the above values of the true variauces are estimates subject 
to random errors, since the cylinder and journey effects vary from 
run to run or from pot to pot, and the few that have provided the 
estimates are a random sample from an infinite population of trends 
obtained from an infinite population of runs. Had there been a 
significant cylinder effect common to all runs, it would not have 
been subject to random errors in quite the same way. However 
many runs there had been, there would have been but two degrees 
of freedom for Vg ; the effect of a large number of pots is merely to 
increase in the equation 

^8 + Vq 

and reduce the error Vg, thus reducing the effect of random errors 
in Vq in estimating of. Similar remarks apply to the journey effect. 

We have developed the foregoing analysis by treating means — ^pot 
means, cylinder means, and so on. The computation is much easier, 
however, if corresponding totals are dealt with in the way outlined 
at the end of section 10.3. The general rule is first to square and 
add the various totals and divide by the number of individuals 
forming each total, producing a series of ‘^adjusted sums.” Then 
the sum of squares entered in Table 10.5 for any factor measured 
as a series of deviations from the means corresponding to a second 
factor is the “adjusted sum” for the first factor minus that for the 
second. As an example, let us find the sum of squares entered in 
Table 10.5, below rows (8) and (9). The first factor is represented 
by the nm-cylinder totals and the second by the run totals (i.e. 
in the analysis, run-cylinder means were measured as deviations 
from run means for all cylinders). There are 10 observations per 
run-cylinder total and 30 per run total; the adjusted sums are: 

(4462 + 6142 + 9952 + 5922 4. ^ 10 = 401 205-70 

and (2 0552 + I 88o2 + . . . ) -^ 30 = 380 540-10, 

and the sum of squares is the difference between these, viz.: 
20W5-60. 


Akrangements for Agricultural and Analogous Experiments 

10 « 4 « In no subject has the application of the principles we have 
just introduced for reducing the errors of comparisons of treatments 
or conditions been so complete as in that of agricultural experiment. 
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The actual statistical technique is (^pable of application tO' other 
subjects, and it illustrates well the analysis of complex variations; 
for these reasons as well as because of the practical importance 
of the question itself, we shall deal explicitly with the arrangement 
of plots for field trials. There is an example of an application to 
another subject in section 10,5, 

It is well known that there are regional changes in natural fertility 
over areas of land, and that if varieties of corn (for example) are 
compared by sowing one large plot with each and measuring the 
yields, there is a possibility that any measured differences may be 
due to variations in soil fertility rather than to varieties of com. 

TABLE 10.6 

Yields of Grain in Grammes 


(Plots X 2| ft.) 



I 

2 

3 

4 

5 


I 

360 

324 

306 

374 

277 

4 

-s 

i 

, 2 

287 

324 

282 

252 

277 

f 

i 

3 

374 

302 

247 

234 

256 

4 

316 

304 

232 

261 

271 

5 

354 

364 

299 

302 

201 


6 

295 

294 

298 

313 

268 


7 

197 

201 

236 

209 

217 


8 

335 

320 

259 ’ 

205 

207 

‘ 

9 

267 

262 

379 

332 

268 

, 

10 

291 

300 

350 

383 

264 

N 


For that reason the necessity of repeating the comparisons over 
several plots so that the effect of soil variations can be assessed is 
well recognised. It is true, also, that these variations often exhibit 
trends so that if small plots are taken the average fertility of neigh- 
bouring ones tends to be alike; the art of planning an experiment 
lies in arranging that the varieties or conditions under investigation 
(we shall call them treatments) occur in neighbouring plots, so that 
the larger regional changes in fertility are eliminated from the 
comparison. It is desirable also that the standard error of the differ- 
ences due to the uneliminated variations in fertility may be estimated 
so that the significance of any difference can be tested. 

For the purposes of studying soil variations, many uniformity 
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trials have been conducted, and the data of Table 10.6, taken frc«n a 
paper by Christidis (1931), are the results of one conducted at 
Cambridge. They are the yields of com from fifty adjacent plots of 
n\ X 2I feet treated uniformly, and harvested separately. If we 
wished to compare five treatments (say), we could distribute them 
at random over the fifty plots. In such case the st andard error of 

the mean for any one treatment would be ^ VfjzOy where is the 
residual variance between plots after treatments have been elimi- 
nated, and that of the difference between two means would be 

V2 times this; here the treatments are the same, and the residual 
variance for such an arrangement is the mean variance for the fifty 
yields and equals 95 940/49 == ^ 95 ^- 

10 . 41 . We might expect to eliminate the variations between rows, 
however, by ensuring that every treatment occurs once in each row, 
so that only the residual within-row variance contributes to the errors 
of the differences. Labelling the treatments A, B, C, D and E, such 
an arrangement as the following could be tried, 

abode 

abode 

We have analysed the variance of the yields of Table 10.6 into 
three portions as in Table 10.3 — variations between rows, between 
treatments arranged as shown and a residual within rows; the results 
are in Table 10.7. There are ten rows giving nine degrees of freedom, 
five treatments giving four degrees, fifty plots giving forty-nine 
degrees for the total and leaving thirty-six for the residual. We know, 
however, that the five treatments are all the same, and so may combine 
the treatment and residual sums of squares to give a within-row 
variance; we see that by eliminating the row variations, the 
variance is reduced from i 958 to i 218; i.e. a comparison based on 
about 12 replicates properly arranged with one of each treatment 
per row is as good as one based on about zo perfectly random 
replicates. In practice, if the treatments differ, their significance 
is tested on the basis of the residual variance of i 063 ; in this example 
the treatments variance (2 616) paradoxically seems greater than the 
residual. For the difference 2 = 0-45, and being only just below the 
5 per cent, point, is suggestive (although not strongly) of a real 
effect. This is because we have neglected a fundamental principle 
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— tiie plots are not randomly distributed within the restrictioiis 
imposed, and the treatment differences coincide with column differ- 
ences. The arrangement would have been sound if we had settled 
the order of the plots in each row separately by chance (say by draw- 
ing lettered tickets from a hat), and the residual variance (i 063 in 
Table 10.7) would not have been very different from the variance 


TABLE 10.7 

Analysis of Variance of Crop Yields 


Source of Variations 

Sums of Squares 

Degrees of 
Freedom 

Variance 

1 

j 

Between rows 

Between treatments . . 
Residual 

47 201-6 

10462-6I „ „ 

0 of 4 ® 73^*4 

38275-8]^ ^ 

9 

5^45 
2 616; 
I 063^ 

^ I 218 

" 1 . 

i 

i 

Total . . 

95 940-0 

49 

I 95 ^ 




within rows (i 218) ; the reader is recommended to try it out on 
the data. 

Other systematic arrangements have been devised to overcome 
such obvious defects as shown by the one just condemned, and of 
them the “chessboard’^ had been very popular. This places every 
treatment as close to every other as possible, and an arrangement 
for seven treatments in rows of five is here shown, 

ABODE 
F G A B C 

D E F G A 

B C D E F 

It will be seen that the treatments may be assigned in the same 
order, row after row, and yet they are all represented in every column. 
If there are as many treatments as plots per row, each one may be 
two places to the right of its position in the previous row, viz. 

ABODE 
D E A B C 

B C D E A 

E A B C D 

C D E A B . 
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The usual method of treating the results is to divide them into groups 
containing all the treatments in the order in which they occur; in 
the seven treatments example above, the first group would contain 
the first row and the first two plots of the second, the second would 
contain the last three plots in the second row and the first four in 
the third, and so on ; for five treatments, the groups would be rows. 
Then the total valance is analysed into three parts — ^between groups, 
between treatments and a residual from which the standard error of 
a difference may be calculated. The danger of column variations 
giving spurious eflFects is eliminated, but such arrangements do 
produce regular, if more complex patterns, and there is a remote 
chance of the soil varying periodically and more or less in step with 
the pattern. Further, the arrangement does not make the best pos- 
sible comparison, for, taking the set of five treatments, the groups 
are rows, and the residual variance (which for Table 10.7 is i 218) 
includes column diff’erences, although they do not enter into the 
comparisons. The estimates of errors in such systematic arrangements 
are thus not valid. 

Restricted Random Arrangements of Plots 
Random Groups 

10 . 42 . For large numbers of treatments it is better to have groups of 
as many plots as there are treatments, arranged in clumps rather 
than in rows, and to assign the position of each treatment within 
the group by chance. The random arrangement of five treatments in 
the rows of Table 10.6 is a special case of this, and the analysis of 
the yields is the same as in Table 10.7.* 

The Latin Square 

The application of the Latin square arrangement to field experi- 
ments is due to Fisher (1936).! If there are n treatments to be 
compared, a block of plots having n rows and n columns is marked 
out, and the treatments are distributed at random, with the restric- 
tion that each one must occur once in each row and once in each 
column. The field of Table 10.6 has 5 X 10 plots, and so is suitable 

* The results may be written in a table of rows and columns, each row 
referring to a group and each column to a treatment; then the terms of 
Table 10.3 apply directly. 

f The first edition of this book appeared in 1925. 
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for the formation of two Latin squares — we give an example below. 
The variance of such an arrangement may be fairly simply analysed 
into five parts: (i) between a blocks (i degree of freedom), 
(2) between 10 rows (the deviations are measured from 2 block 
means, giving 8 degrees of freedom), (3) between columns (we regard 
the 10 columns of the 2 blocks as contributing separately to the 
variance, giving 8 degrees of freedom), (4) between treatments 
(4 degrees of freedom) and (5) a residual with 28 degrees of freedom, 
giving a total of 49 degrees. Equation (10.4) may be extended to 


I 

2 

3 

4 

5 

[ 





B 

E 

A 

D 

C 

D 

A 

E 

C 

B 

E 

D 

C 

B 

A 

, A 

C 

B 

E 

D 

C 

B 

D 

A 

E 

D 

B 

C 

A 

E 

A 

C 

E 

D 

B 

B 

A 

D 

E 

C 

E 

D 

B 

C 

A 

C 

E 

A 

B 

D 


give equation (10.7), where the same notation is used, the suflGixes r 
referring to blocks, j to rows, t to columns and u to treatments. 

(x—x) = (», — x)-i- (x, — X,} + (Xt — x^) + (x„ — ;^ + d. (10.7) 

The rows and columns are measured as deviations from their block 
means, the blocks and treatments from the grand mean. When this 
is squared and summed, because of the restriction that each treatment 
is represented equally in each row and so on, the product terms 
become zero, and we have equation (10.8), which is parallel to (10.5) 
and may be entered in a table of analysis of variance. 

S(x — x)^ = n^S{x, — k)® + nS{x, — *,)* 

+ nS{xt — x^)^ + 2bS(*„ — + Sd^, . (10.8) 

where n is the number of rows per block. 

These terms are the sums of squares of deviations of the various 
mpana from the block or grand mean, multiplied by the number of 
observations contributing to those various means. The actual arith- 
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metic is far less complicated than the algebra, and if the reader will 
work through an example, he will soon gain confidence. 

When there is an awkw^ard number of treatments (like 6), so that 
the means are not correct when calculated to one or two decimal 
places, the following arithmetical scheme, extended from that in 
section 10,3, is convenient: 

(i) Square every plot yield and sum, 

(4 Find the total yields for the blocks, square and sum them 
and divide by the number of plots per block, 

(3) Find the total yields for the rows, square and sum them and 

divide by the number of plots per row, 

(4) Repeat this for columns, 

{5) Repeat this for treatments, and 

(6) Square the grand total yield and divide by the total number 
of plots. 

Then the terms in the following are equivalent to the corresponding 
ones in equation (10.8), 

[(I) - (6)] = [(2) (6)] + [(3) - (2)] 

+ [( 4 ) - ( 2 )] + [( 5 ) - ( 6 )] + SiP. 

The data of Table 10.6, with the above arrangement of imaginary 
treatments, have been reduced in this way, and the analysis is in 
Table 10.71. The treatment variance, although greater than the 
final residual, is not significantly so (a: =0*38 while ar = 0'50 
lies on the 5 per cent, point), and we will combine them and regard* 
the mean variance 866 based on 32 degrees of freedom as the residual. 
The row and treatment variances are both greater than the residual 
(the ar’s lie above the 5 per cent, point), and the arrangement has 
reduced the residual variance to 866 (i.e. 9 replicates so arranged 
are as good as 20 at random). 

The block variance may not be compared with the residual, 
because the rows and columns are not common to both blocks, and 
their variations contribute to the block differences. 

For comparing treatment means, the standard error of each mean 

is ^/ where zn is the number of plots per treatment. In 
making such comparisons of several means, due regard must be 
paid to the general considerations advanced in section 3.33. 

It may be mentioned in passing that Latin square arrangements 



PRINCIPLES OF EXPERIMENTAL ARRANGEMENT 233 

arc useful in many fields of experiment and liave also l>een used 
for separating sources of variation not subject to experimental 
control, as was done in section 10.31. The “treatments,” the rows 
and the colu mn s may represent three superimposed factors, and 
there may be several squares, each with different individual repre- 
sentatives of the factors. Then the three corrected variances are of 
more interest than the means for the “treatments” and their 
standard errors. 

Every separate experiment must be arranged in its own, inde- 
pendently and randomly chosen Latin square. To form a square, 


TABLE 10.71 

AlNALYSis of Variance of Crop Yields 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Variance ^ 

Blocks 

3 698*0 

] 

I 1 

3 698 


Rows 

43503-6 

8 

5438 


Columns . . 

21 042*4 

8 

2 630 


Treatments 

Residual . . 

6 4S2-6\ 

> 27 696*0 
21 243 *4 J 


1613' 

759 . 

* 866 

Total . . 

95 940*0 

49 

1 958 

a number of cards 

may be lettered. 

one for each treatment, and 


shaken up in a hat. Then the letters may be assigned to the position 
in a row in the order in which the corresponding cards are drawn. 
The whole procedure may be repeated for ihe second and subsequent 
rows. Proceeding in this way, however, it will usually be found that 
after three or four rows, one or two letters appear more than once in a 
column. Whenever this happens, the row should be re-drawn until 
the restriction of one treatment per column is satisfied. In six-by-six 
or larger squares, a good deal of adjustment may be necessary and 
this may allow some scope for the unconscious introduction of a 
slight degree of system in the arrangement, which may tend to 
repeat itself if one experimenter has to make many squares. After 
a Latin square has been designed, however, the rows and columns 
may be rearranged in some random order fixed by drawing numbered 
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cards from a hat, and if this is done it is hardly likely that any 
serious lack of randomness will survive in the scheme. However, 

workers who have to form many Latin squares are advised to use 
such methods as described by Yates (1933^2). 

Use of Controls 

10 . 43 . One common method of comparing a number of experi- 
mental treatments when there are disturbing variations is to repeat 
one of the treatments, a “control, ’’ with each of the others, under 
conditions as similar as possible. Then, the diifferences between the 
treatment and control values are measured, and these differences are 
used to compare the treatments. In order that the precision of such 
an experiment may be measured, the treatment-control pairs must 
be replicated, and the relative positions of the treatment and control 
within each pair must be fixed independently and at random. 

For example, let us suppose we wish to compare five treatments 
ABODE on the plots referred to in* Table 10.6, repeating A as 
a control with each of the others in a contiguous plot in the same 
row. Then we can only use four columns and the arrangement may 
be like the following: 


CoL 1 

CoL 2 

CoL 3 

CoL 4, 

B 

A 

A 

C 

A 

D 

E 

A 

A 

B 

C 

A 

D 

A 

E 

A 

etc. 

etc. 

etc. 

etc. 


. The differences between the treatments and controls may be found ; 
if the variance within the treatments of these differences is v and 
the number of replicates of each treatment is the standard error 
of the mean difference between any treatment and the control is 

V vjn^ while that of the mean difference between any two of the 
treatments B to E is a/ 2vln. 

Since the data of Table 10.6 are the result of a uniformity trial, 
2? may be estimated as twice the variance within the pairs, and from 
the formula at the end of section 5.1 (p. 1 1 1), we find ihsct v = 737 • 6. 
The standard error of the difference between any two of the four 

treatment effects is therefore Vi 475 and 8w plots have been 
used. The standard error of the difference between treatment A 
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and the other four is V" 737*6/72. If the four trestir.ents and control 
were distributed among 8n plots in the random arrangements, there 
would be 8«/5 replicates and the standard error of the difference 
between any two treatments would be 

/ a >^1 958 X 5 ^ / a 447* 5 
^ 8/2 22 

The method using a control is, in this instance, more efficient than 
the random method. On the other hand, similar calculations show 
the Latin square method to be more efficient than either except for 
comparing treatment A with B, C, D or E. 

The use of a control may provide very accurate comparisons if 
it is technically convenient to bring each treatment and its control 
into very close proximity. This is achieved in Beavan’s ‘‘Half-drill 
strip” method in which adjacent long narrow strips are assigned to 
the control and treatment in pairs. Often, the arrangement is such 
as to sandwich two strips of treatments between two of controls. 
This practice is to be deprecated, however, as it violates the principle 
of randonmess; within each pair of strips the relative positions of 
the treatment and control should be decided by chance. 

, Multiple Factor Experiments 

10 . 5 . A multiple factor or factorial experiment is one in which 
several factors are varied together. For example, in an experiment 
arranged to investigate the effect of an ingredient of the size mixture 
used for protecting cotton warps in weaving, there were four treat- 
ments in which there were two forms of this ingredient, A and By 
and each was used in the size in two concentrations: i and 2 units. 
The treatments were ; 

(i) 1 unit of ingredient A 

(ii) 2 units of ingredient A 

(iii) I unit of ingredient B 

(iv) 2 units of ingredient B, 

Further, the experiment was done twice on two separate qualities 
of yam, X and F. Thus there were three factors : form of ingredient, 
quantity of ingredient and quality of yam. The quality measured 
on the warps was the rate at which the warp threads broke during 
weaving, the rate being expressed as the number per 10 000 picks.^ 

* One “pick” corresponds to the insertion of one weft thread. * 



236 METHODS OF STATISTICS 

Althougli the three factors were varied together in the same 
experiment, the arrangement is balanced so that both ingredients 
were present in both quantities and were used on both yarns. Con- 
sequently, the deviations of the two mean breakage rates for ingre- 
dients A and B from the grand mean are unaffected by anything 
but the forms of ingredient and errors. These mean deviations are 
said to measure the mam ^ect of ingredients. Similarly, the main 
effects of the quantities of ingredient and of the yarn qualities may 
easily be measured, each being undisturbed by the other two factors. 


TABLE 10.8 

Wabp Breaks per io ooo Picks and Treatments 


Loom.. .. 

7 

! 

8 

1 

9 

1 

10 

Period 


Yarn X 


I 

I -4 (iv) 

5-3 (i) 

7-4 (ii) 

3-3 (iii) 

2 

2*2 (iu) 

4-1 (ii) 

8-8 (i) 

1*6 (iv) 

3 

2*1 (ii) 

3*2 (iv) 

4*8 (iii) 

4-7 (i) 

4 

3 -4 (i) 

3-3 (iii) 

2*6 (iv) 

4’3 (ii) 

IwOOXiJi • • II « 

15 

16 

17 

18 

Period 


Yam Y 


5 

2-8 (iii) 

r-S (i) 

1-9 (ii) 

2*0 (iv) 

6 

I -5 (iv) 

1-2 (ii) 

2-0 (i) 

1*8 (iu) 

7 

2-5 (ii) 

1-4 (iv) 

2 -4 (iii) 

2*0 (i) 

8 

5-8 (i) 

1-9 (iii) 

I • 7 (iv) 

3-2 (ii) 


It is possible, however, that the effect of varying the quantity of 
ingredient may be different for form A than that for form or 
alternatively, the difference in response of warp breakage rate to 
A and B may depend on the quantity of each present. If this is so, 
there is said to be an interaction between forin and quantity of 
ingredient. There may be interactions between any pair of the three 
factors. If such interactions exist, it may also be that the effect of 
increasing the quantity, say, from i to 2 units may be different 
according to whether the ingredient is A on yam A on F, 
£ on X or jB on Y. If such an effect exists, there is said to be a 
second-order interaction between the three factors. 
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The main effects and interactions may be measured and tested 
by an analysis of variance on the results, and with this in view we 
shall deal with results of the above experiments in full. 

There were four warps* of each yam, one treatment being used 
for each warp. The Tvarps of yam X were woven in four Icm>iiis, 
and the total weaving time was divided into four periods. Between 
the periods, the warps were interchanged among the looms in such 



TABLE 10.9 

Analysis of Variance of Warp Breakage Rates 


Source of Variation 


Sums of 
Squares 

Degrees of 
Freedom 

Variance : 

(i) Yams 

• ■ 

17*701 

I 

— 

17*701 

(2) Periods 

• • 

10-348 

6 

1*725 i 

(3) Looms 

• • 

36-763 

6 

6*127 i 

(4) Treatments 

• • 

28*978 

6 

4*830 « 

' (5) Residual 

• • 

8-539 

12 

0*712 

(6) Total 

« • 

102*329 

31 

— 

Main Effects 

(7) Quantity . . 

• « 

5*120 

I 

5*120 

(8) Ingredient . . 

■ « 

17*111 

I 

17*111 

Interactions 

(9) Quantity — Ingredient 

• • 

0*046 

I 

0*046 

(10) Quantity — Yam . . 

• • 

0*320 

I 

0*320 

(ii) Ingredient — ^Yam . . 

• « 

6*302 

I 

6*302 

(12) Quantity — ^Ingredient- 

-Yam 

0*079 

I 

0*079 


a way as to complete a Latin square arrangement with periods as 
rows and looms as columns. The warps of yam Y were woven in 
separate looms and periods from X, in another Latin square. The 
^arrangement and warp breakage rates are given in Table 10.8. 

First let us analyse these data as we did the crop yields in the 
double Latin square of section 10.42, except that since we are con- 
sidering the possibility of yam-treatment interactions we will keep 
separate the four treatments in each block, giving eight duplicated 
treatments with six degrees of freedom. The analysis is in rows 
(i) to (6) of Table 10.9. Rows (2) to (5) would be obtained if the 
two Latin squares were analysed separately and the results com- 
* A warp is a unit quantity of yam sized uniformly. 
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bined. The variance due to yams in Table 10.9 may not be compared 
with the residual term, since loom and period variations contribute 
to the yam differences. The variance due to treatments in row (4) 
is significantly greater than the residual, so we may analyse the 
treatment effects further. 

The yam effects may be eliminated by combining the results of 
the two squares. "When this is done, the totals of the breakage rates 
for the four basic treatments are : 

Ingredient 





A 

B 

I urdt 

• • 

• « 

32*5 

12*4 

2 units 

« ■ 

• ft 

26*7 

14*4 


This is a 2 X 2 table of the kind analysed in section 10.3, and may 
be analysed into parts due to quantity, ingredient and the residual, 
with one degree of freedom. Here the residual is the inter- 
action between quantity and ingredient. For inclusion in Table 10.9 
and comparison with the sums of squares given there, the sums of 
squares of this subsidiary analysis must be divided by 8, the number 
of readings per total. The same results would be obtained if the 
analysis were performed on the treatment means and the sums of 
squares were multiplied by eight. This is equivalent to performing 
the summations over all individual observations, and follows the 
lines of previous analyses. The results are in rows (7), (8) and (9) 
of Table 10.9. 

To obtain the quantity-yam interactions, we may eliminate the 
ingredient effects and obtain the following totals : 

Yam 





X 

Y 

1 vnit 

• • 

• • 

33'7 

20-2 

2 imits 

• ft 

• ft 

25-7 

15*4 


The analysis of these totals with the sums of squares divided by 

eight gives variances due to: yarn [row (i) of Table 10.9], quantity 

[row (7)] and the quantity-yam interaction [row (10)] . 

Similar ly, the ingredient-yam interaction entered in row (ii) of 

Table 10.9 may be obtained from the following totals: 

^ 1 
Ingredient 





A 

B 

YamX 

ft ft 

• ft 

39’i 

20-3 

Yam Y 

ft ft 

• ft 

20*1 

15*5 
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The second-order interaction between the three factors is most 
easily obtained by subtracting the treatment effects so far determiiied 
from the total effect in row (4). This is given in row (12). 

The variances in rows (7) to (12) of Table 10.9 are due to the 
various treatment effects plus the experimental error, so that they 
may all be tested against the variance in row (5) for significance. 
The main effects of quantity and ingredient, and the interaction 
between ingredient and yam are sigmficant, and we may express 
the final results by the following mean values of the brea^ge rates 
per 10 000 picks : 


I unit 

(yamX 
Ingredient A % y 


3 * 4 , 

4 * 9 , 

2 * 5 , 


2 units 


Ingredient B 


yam X 
yam Y 


2 - 6 , 

2 - 5 , 

1-9. 


For comparing the quantities, since there are i6 readings per mean, 

the standard error of the difference is ± V2 X 0-712/16 = ± 0-30, 
and the corresponding standard error of a difference between ingre- 
dients on one yam is ± '\/2 X 0*712/8 = ± 0*42. The difference 
between ingredients on yam Y is not significant. 

Thus, the final conclusions from these experiments are that the 
use of two units of either ingredient instead of one reduces the 
warp breakage rate slightly on both yams, that on yam X ingre- 
dient A gives a considerably lower breakage rate than B and that 
on yam Y there is little or no difference in response to the two 
ingredients. It is interesting to note that the breakage rate is higher 
on yam X than Y, and although we have not tested to see whether 
this is due to the yams or to loom and period variations, it may be 
that the higher breakage rate is necessary to give sensitivity to the 
form of ingredient. 

The computations for this example may be readily performed by 
the method of section 10 .j, squaring and summing the various totals 
and dividing by the numbers of original observations in the totals. 
Before combining the results of the two Latin squares, it is well to 
perform the analyses separately to see if the two residual variances 
differ significantly. If they do, the tables should not be combined. 
We have ascertained that the two residuals are substantially the 

same, but have not given the results. 

When possible, it is advisa.ble to have more variations of each 
factor than were used in this experiment. With so few degrees of 
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freedom only marked effects are signific^t. More factors may be 
superimposed and higher order interactions investigated. Fortu- 
nately interactions of a high order are seldom significant; if they 
were often of importance experimental investigations would be very 
cumbersome and generalisations would be difficult. 

Statistically, an interaction betv^een factors appears as a residual 
in an analysis, and in the same w^ay any residual may be regarded 
as an interaction. For example the residual term in Table 10.9 
arising from the first analysis of Table 10.8 is in fact due to experi- 
mental errors and interactions between treatments, looms and 
periods. If a trial were made with imiformly sized warps, the 
residual would be less than that of Table 10.9 by the variance due to 
these interactions. Where such interactions exist, the assumptions 
underlying the analysis of a Latin square are not satisfied and it 
is advisable to arrange the treatments in random groups. 

General Discussion 

10 . 6 . Technical literature, particularly that relating to agriculture, 
horticulture, forestry, animal breeding and other branches of applied 
biology, abounds in examples of the application of the foregoing 
arrangements and developments of them to practical experiments. 
A fairly detailed account of the subject treated from the points of 
view of the agriculturist is given by Sanders and Wishart (1935). 
Practically the whole of this experimental technique owes its inspira- 
tion to Professor R. A, Fisher, and readers are strongly recommended 
to refer to his book The Design of Experiments (19364). There, 
Professor Fisher deals with the fundamental principles of experi- 
mental procedure and treats of various methods in detail. 

Of the possible methods of arrangement, the most suitable will 
depend on the particular field of inquiry, on technical considera- 
tions and on the nature and extent of the variations in the untreated 
subject of experiment. In this connection, statistical surveys or 
uniformity trials are very valuable. 

Results obtained from any particular experiment are subject to 
the obvious limitation that they do not necessarily apply under 
conditions other than those obtaining for the trial. For example, in 
a cereal variety trial, variety A may be significantly better than 
variety jB, but it does not follow that A will always be better than jB, 
e.g. in different localities and weathers, nor even that A will be 
better than B on the average. If it is desired to know whether A 
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gives a better yield than B on the average for all years and for 
several localities, the trial must be repeated in those localities and 
for several years, and the mean difference between the varieties be 
compared with a standard error obtained, not from a mean residual 
variance within the trials but from a residual variance between the 
trials. This point needs emphasis, because it may happen in practice 
that a highly significant result based on a single trial has only a 
narrow range of application. 

This is one reason why multiple-factor experiments such as that 
exemplified in section 10.5 are advocated. The average effect of 
increasing the amount of the ingredients investigated there is 
established not only for one ingredient, but for two ingredients 
and yams, and the applicability of the results is correspondingly 

widened. 

In all the arrangements we have considered in this chapter, the 
various factors have balanced so that as far as possible each indi- 
vidual of any one factor is associated once wnth each individual of 
another. This has made the analysis much easier than it W’ould 
otherwise have been. Equations (10.5) and (10.8) were much sim- 
plified because certain product terms were zero, so that the deviations 
due to each factor could easily be separated and estimated inde- 
pendently. This property is termed orthogonality and is highly 

desirable. 

Sometimes, however, accidents occur and individual readings are 
lost. In many such instances the missing readings may be estimated 
by the method of least squares, so that the analysis of variance is 
still possible. In some experiments, too, a complete arrangement 
involving every combination of all factors may be xmwieldy and 
unnecessary. Then certain combinations may be omitted, and the 
corresponding treatments are said to be confounded. For example, 
in the arrangement of Table 10.8, the yam effect is confounded with 
file loom and period effects, and there is no possibility of measuring 
the interactions. This aspect is dealt with by Fisher (19366), and 
the treatment of a number of incomplete and confounded arrange- 
ments is worked out in a number of papers by Yates (1933 and 

1936). 

Whether the experimental arrangement is balanced or unbalanced, 
it should be carefiilly designed, otherwise serious inefficiency may 
result or two important effects may be confounded and so be 
unmeasurable. The time to consult statistical principles is before 

Q 
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the experiment is planned, not after the results are obtained and 
are in confusion. 

It is well to emphasise again that all the tests of significance and 
estimates of corrected variance in this chapter are based on the 
assumption .of a homogeneous residual variation; i.e. it is assumed 
that the residual variance is, within the limits of random errors, the 
same for all blocks, rows, columns and factors that are combined. 



CHAPTER XI 


MULTIPLE AND PARTIAL CORRELATION AND 

REGRESSION ■ 


Pahtial Correlation 

11 . 1> showed in the last chapter how the heterogeneity of 

variability has an important effect on the theory of errors, and in 
this chapter shall study its effect on correlation. 

Table ii.i shows the mean grain and straw yields for ^ac h of 


TAELE II.I 

Grain and Straw Yields 


Treatment 

Block 1 

Grain Straw ' 

Block 2 

Grain Straw 

Blocks 

Grain Straw 

Block 4 1 

Grain Straw ' 

A 

630 

242 

646 

321 

681 

261 

644 

317 ^ 

B 

644 

267 

745 

382 

542 

201 

71 1 

316 

C 

533 

215 

713 

330 

686 

298 

688 

3^1 ■ 

D 

601 

312 

693 

292 

68s 

265 

714 

255 

E 

664 

322 

693 

370 

666 

284 

516 

323 

F 

514 

200 

637 

261 

697 

259 

710 

361 

G 

550 

260 

708 

318 

663 

266 

673 

340 

H 

521 

203 

661 

275 

594 

207 

730 

331 


Block 5 

Block 6 

Block? 

Blocks 

A 

706 

255 

615 

331 

552 

316 

726 

295 

B 

70s 

280 

637 

285 

543 

200 

646 

309 

C 

692 

300 

612 

294 

63s 

256 

748 

284 

D 

699 

338 

697 

309 

701 

283 

746 

324 

E 

656 

332 

663 

393 

657 

351 

6S3 

363 

F 

633 

234 

595 

258 

697 

306 

712 

376 

G 

671 

362 

626 

400 

65s 

276 

671 

38s 

H 

625 

229 

644 

266 

745 

276 

747 

328 


64 plots (the units do not concern us here), there being eight different 
manurial treatments and eight replicates on each. The data are 
selected from a paper by Eden and Fisher (1927), and are merely 
used here for illustrative purposes; readers who are interested in the 
agricultural aspect must consult the original paper. Further, we 
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shall for the moment ignore the fact that the sample is small with 
comparatively large sampling errors, and shall treat the statistical 
constants as though they were almost exact. The treatments were 
distributed at random within blocks and we have in Table 11.2 
analysed the variance of grain and straw yields into three parts ; the 
fifth, eighth and ninth columns should be ignored for the moment. 
The block variation is much greater than the residual for both 
grain and straw, and so is the treatment variation for straw, while 
the variance of grain yields is less for the treatments than for 
the residual (although not significantly so); thus the variations in 
the yields of Table ii.i are heterogeneous. Now in order to find the 
relation between grain and straw yields, we may correlate the sixty- 
four pairs of readings, and the crude correlation of the actual readings 
gives a coefficient of +0*524, indicating a positive relationship. 
This, however, is not the whole story; the variations are produced 
by three groups of causes, changes in soil fertility, changes in treat- 
ments and that complex of unknown causes which we call random, 
and it is unlikely that the relationship between grain and straw 
yields is the same for the three types of variation. Indeed, in 
a general way, plots that produce most grain might be expected 
to produce most straw; but the treatments which had an effect 
on the straw yield had none on the grain, and this relationship 
cannot hold for treatment variations. It is thus necessary to 
separate out the effects and to find several correlations, and this 
is done by a fairly straightforward extension of the analysis of 
variance. 

Let us suppose first that the block and residual variations are 
due to the same thing (soil variability), and that we wish to separate 
them from those due to manurial treatments, and to correlate them. 
This may be done quite straightforwardly by finding the 64 pairs 
of deviations from the treatment means and correlating them, by 
finding the sums of their products and squares. We may also correlate 
the eight pairs of treatment means (or totals). Using the same nota- 
tion as before, but calling the grain yield y and the straw yield Xy 
the values of an individual pair of readings in the sth treatment as 
deviations from the grand means are 




and 


(* — ») = (»— + (*S — x) ' 

(y — :y) = (y — y,) + Cv* — y), 


. • (ll.l) 



TABLE 11.2.^ — Analysis of Variance and Co-variance of Grain aintd Straw Yields 
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aad their product summed for the ith treatment is 

S'{x — x){y ~y) = S'{x — x,){y — y,) + n,(x, — x)(y, — y) 

+ (y. - y)Six — *5) + {Xs — x)S'{y — j,), 

where is the number of plots per treatment. The last two terms 
are zero since S'{x — * x^) and S'(y •— j/^) are the sums of deviations 
from the mean and are zero, so this equation, when summed further 
over all treatments, gives 

S{x — (y — 3?) = SS\x — X,) (y — y,) 

S 

■}-Sn,(Xs — x)(y, — y). . . (ii.ii) 

The term on the left-hand side is the sum of products used in 
correlating the crude deviations from the grand mean, the first term 
on the right-hand side is used in correlating the deviations from the 
treatment means and the second term in correlating the treatment 
means themselves. This equation is exactly parallel to equation (6.6) 
on p. 132 for the analysis of variance, and may similarly be entered 
in a table of analysis of co-variance. The degrees of freedom are 
reckoned up in the same way, and the sum of products divided by the 
number of degrees of freedom gives the mean product or co-variance. 
The sums of squares and variances can be entered in the same 
table, and the co-variance of any cause of vmation when divided 
by the square root of the product of the variances [equation (7.4), 
p. 165] gives the corresponding correlation coefficient. 

We can deal with the co-variance in exactly the same way as 
the variance, using all the arguments and equations previously used, 
except that co-variance may sometimes be negative while variance 
must always be positive. We saw in the last chapter that variance 
may be analysed into more than two parts, and now will use the 
same methods to analyse the co-variance of the yields of Table ii.i 
into three parts. For a single plot in the sth treatment and rth block, 

= {x, — x) + {xt — ^) + x\ 

(y — y) = (ys — y) + iyt — y)+y', 

where x' and y' are residual deviations; and by a similar argument 
to that used above, summing over all treatments and blocks, 

S(x — x)(y—y) = Sn/x, — x)(y,— y) 

V 

+ Snt{Xf — ^[yt — y) + Sxy, . . (11.2) 
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where n, is the number, of plots in a treatment and the number 
in a block. This is exactly parallel toequation (10.5), p. 215,^ and the 
terms have been entered in Table 11. z under the sutus of products 
column, and when divided by the degrees of freedom, as co- 
variances. For convenience of computing, equation (7.61),' p. 166, 
may be applied to find the terms of the above; i.e. 

S{x — x)(y — y) = Sxy — Nxy, 

Sn,{x, - x) (y, - 3?) = Sn,x^^ - Nxy, 

and so on ; but particular care must be taken of signs, as co- variance 
may be negative. The correlation coefEcients in Table 11,2 have 
been found from the sums of squares and products, and whereas 
the coefficient for block and residual variations is about + 0-6, for 
the treatment variations it is — o • 3 ; the difference is important. 
The residual deviations are independent of block and treatment 
differences, and their correlation coefficient is a partial coefficient 
with both block and treatment factors eliminated. The eight treat- 
ment totals are independent of the blocks and the block totals are 
independent of the treatments, but both are to some extent affected 
by the residual deviations, and although their correlation coefficients 
are in some degree partial, only one factor has been eliminated. 
However, as there are several plots per block and treatment, the 
effects of these two factors predominate over the residual in the 
correlation coefficients. 

The maximum and minimum temperatures in Table 7.4, p. 147, 
are daily readings taken in the month of August for 49 years, and 
we may now be led to inquire if the variations between and within 
years are of the same nature. Fisher and Hoblyn give the sums of ' 
squares and products, and we repeat them in Table 11.3, togetiher 
with the correlation coefficients.! The variances show that the 
between-year variations are real, and the correlation coefficients 
show that the association between maximum and minimum tem- 
peratures is stronger for these variations than for those within the 
year ; that is to say, if we know the average maximum temperature 
for the month of August in any one year we can estimate the average 
minimum for that month with a little greater accuracy than we can 
the minimum for any one day, knowing the corresponding maximum. 

* If fis and nt are constant, they may be placed outside the summation sign. 

t The data of Table 11.3 have been calculated from Fisher and Hoblyn’s 
full correlation table and not from the condensed Table 7.4, 
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is given in the last row of Table 11.21. The variance due to treat- 
ments is now significantly greater than the residual, so that after 
correction for the straw yield, the treatments have a significant effect 
on grain yield. This is partly because of the considerable reduction 
in the residual variance resulting from the important partial correla- 
tion between the two yields. We make no attempt to give a biological 
interpretation of this result. 

If a is the regression of grain on straw yield for the residual 


TABLE 11.21 

Analysis of Vahiance of Grain Yield after Correction for Regression 

ON Straw Yield 


Source of Variations 

Sum of Squares" 

Degrees of 
Freedom 

Variance 

Total within blocks 

123 825*1 

55 

2 251*4 

Residual 

89 026 * I 

48 

I 854*7 

Treatments (difference) 

34 799-0 

7 

4971-3 


deviations as estimated from the third row of Table 11.2, the treat- 
ment means corrected for variations in straw yield are 


Here, 


(corrected) = — a{Xs — x). 



— 0'8i8 91. 


If a were the true population value, the standard error of the differ- 
ence between any two corrected means would be calculable from 
the corrected residual variance in Table 11.21. As it is, the error 
in a also contributes somewhat to the standard errors of the corrected 
treatment means. It is for this reason that the variance due to treat- 
ments in Table 11.21 cannot be obtained directly from the corrected 
means. 

A discussion of the practical aspects of this application of the 
analysis of co-variance technique to agricultural experiments is given 
by Fisher (1936), Sanders and Wishart (1935). 
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Elimination of a Linear Factor 

11.3. In section ii.i, when finding the partial correlation between 
grain and straw yields with block and treatment effects eliminated, 
we measured the yields as deviations from discrete block and treat- 
ment means and correlated the deviations. Sometimes, however, the 
factor to be eliminated can be described quantitatively and its 
relation to the variates to be correlated can be expressed by regres- 
sion lines instead of by discrete means. Then the deviations from 
these lines may be correlated. We shall deal with partial correlation 
when the regression lines are straight. 

If the quantities are x, y and z and z is to be eliminated, let the 
values of x and y lying on the regression lines relating them to z 
be and Y^. Then equation (ii.ii) becomes 

S{x - x)[y - j) = S{x - - y,) + S{X, - *)(F, - j). 

The deviations from the regression lines, may be found explicitly, 
and from them the partial correlation coefficient, or if the regressions 
are linear, the following formula may be used, 


xy 


T T 

' xz’ yz 






where r^y , is partial correlation coefficient between x and y with 
z eliminated, and r^y, and ty^ are the three total correiati<Hi 
coefficients between x, y and z. This equation is proved in ffie 
appendix to this chapter. By interchanging the suffixes, the partials 
between x and z and between y and z can be found from the 
same formula. In using equation (11.3), tables by Miner (iqaz) of 

■\J 1 — for different values of r will be found very useful.^ 

Mumford and Young (1923) give the correlation coeffidents m 
Table 11.4, based on measurements taken on i no boys; we will 
assume the regressions to be linear. It would appear at first sight ^t 
there is a strong tendency for tall boys to have alarge vital capaaty (te 
correlation coefficient is 0-835), but the older boys tod to be taller 
and (as might be expected) to have a larger vital c^aaty . It is reason- 
able, therefore, to suppose that a good deal of the apparent association 
between vital capacity and height may be due to ffie act at e 
taller boys are also older, and before the true connection can be tound, 
correction must be made for age. Similar axgamsats apply to the 
correlations between vital capadty and weight and between height 
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and weight, and, using equation (11.3) to find the three partial 
correlations with age eliminated, we obtain the results in the upper 
part of Table 11.41. The first three coefficients show that the rela- 
tionships between vital capacity, height and weight continue to exist, 
even though the age is kept constant, although they are a little less 

TABLE 11.4^ 

Correlation Coefficients 


Vital capacity and height +0*835 Vital capacity and weight +0*851 
Vital capacity and age . . +0*662 Weight and age .. +0*701 

Height and age .. +0*714 Height and weight .. +0*897 


* We only include a portion of the data in the paper. 

strong. We are still, however, far from a complete knowledge ; part 
of the association between vital capacity and height may really be 
an indirect consequence of the fact that tall boys are also heavy, and 
weight may be the determining factor, or, conversely, height may 
have the direct and weight the indirect connection with vital capacity. 
Again we may use the formula of equation (i i .3) to eliminate further, 
and in turn, weight and height, and the results are in the lower part 
of Table 11.41. The correlations are now much reduced, but are 


TABLE 1 1. 41 

Partial Correlation Coefficients 


Vital capacity and height (age eliminated) . . 

Vital capacity and weight (age eliminated) . . 

Height and weight (age eliminated) . . 

+0*690 
+0 * 724 

+ 0*794 

Vital capacity and height (weight and age eliminated) 
Vital capacity and weight (height and age eliminated) 

.. +0*271 

+ 0-399 


still real, and it appears that the association of vital capacity with 
height is a little less important than that with weight. Thus, by 
means of equation (11.3), any number of factors can be eliminated 
one at a time, and ultimate partial correlation coefficients can be 
found. In using the formula, it is advisable to retain more significant 
figures than will be required in the final answer, as errors due to 
too drastic approximation tend to multiply themselves. Munoford 
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and Young in their paper consider other factors (stem length 

girth), and the conclusions even of the second part of Tabk’ ii ai 
cannot be regarded as final, j ' - 


" f ■■ I itf 

Elimination of a Non-linear Factor 

11 . 3 . If the recession between any two factors differs much from 
linearity, equation (11.3) is not valid, although it may often be used 
as a fair approxiimtion when the departure from linearity, although 
statistically sigmficant, is not very great. It is always possible of 
course, to fit curved regression lines of ic and on 0, to find the ac^ 
deviations of and y; from those lines and to correlate the deviations 
but m such circumstances it is obviously desirable (and necessary’ 
if tests of sigmficance are to be applied) that both a: and y should 
be fitted in the same way and to the same degree. Any method of 
smoothing may be used to determine the regression curve, but some 
such form as the polynomial, which can be found by the method of 
least squares, is preferable so that the number of degrees of freedom 
absorbed in fitting may be known. If the polynomial is used (say) to 
the Aird degree in z, and * and y are Ae variables whiA are to be 
partially correlated, it is permissible to regard x, y, z, and x® as 
five variables related linearly (finding Ae inAvidual’ values of 2® 
and 2® from those of z), to correlate Aem in every possible way, 
and then to eliminate 2®, 2® and 2 in turn by repeated applicatimis 
of equation (11.3) as shown in Ae last section. If Acre is only raie 
value of Ae dependent variable to every one of that whiA is to 
be eliminated, and a polynomial can be fitted according to Fisher’s 
system as outlined in section 9.31, Ae operation of correlating 
residuals is very easy ; we deal wiA it in Ae next se Aon. 

11 . 31 . We found constants A, B, C, etc., of Ae polynomial [equatko 
(9.6), p. 197] and said that Ae sums of squares of Ae residuals were 
foimd by subtracting in turn from Ae crude sum of squares <rf Ae 
observations, quantities NA^, etc. (seAon 9.31); similarly Ae sumc 
of products of the residuals after fitting any number of terms of Ae 
polynomial are obtained by subtracting successively from Ae crude 
sum of products the quantities, 

^ ^ MiV® - i)„ „ N(m - 1) (W® - 4)^ ^ 

-NA,A„ ^ 


12 


180 


N{m - i) (W® - 4) (AP= - 9) 
2 800 


DjD^jttc. . (11.4) 
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(paying due regard to sign) where A^, Bj_y etc., are the constants of 
the polynomial of the first factor on the eliminated one and Aq, B^y 
etc,, are the constants of the polynomial relating the second factor 
to the eliminated one. If a curve of zero order is fitted, i.e. if the 
deviations are measured from the mean, the sum of products of 
residuals is — NA1A2, and this is the ordinary method of 
correcting to the mean by subtracting Nxy; the subtraction of the 
other terms for curves of higher order may be regarded as equivalent 
to correcting to a moving mean. These product terms are just as 
much part of the analysis of co-variance as the square terms are of 
the analysis of variance, and we have inserted them in Table 11.5 
for the two series of protein percentages, the original data being 
in Table 8.1, p. 174, and the constants being given on p. 199; 
Table 11.5 corresponds exactly to Table 9.4 (p. 200), and the two 
should be read in conjunction. It would be a good and convincing 
exercise if the reader were to determine the 58 protein contents 
given by the two cubic regressions, find the actual residuals, and 
see if their sums of squares and products agree with those in Tables 
9.4 and 1 1. 5. As before, the sum of products at any row may be 
used with the corresponding sums of squares to determine a partial 
correlation coefficient. We will now use these data to illustrate another 
type of statistical problem. 


Spurious Correlation of Time Series 

llA An investigator may ask if there is any common factor (perhaps 
weather) which affects the protein contents of both series of com in 
the same way, and it is natural to use the correlation coefficient to 
examine this question. The cmde correlation coefficient as found 
from the “totals” rows of Tables 9.4 and 11.5 is — 0-638, and 
would at first sight appear to indicate that if the common factor 
affects protein content, it acts on the two series in opposite ways. 
We have to remember, however, that because of the way in which 
seed was selected for the two series, one increased in protein content 
while the other decreased, and this predominating effect gives a 
negative correlation coefficient. In order to investigate the effect of 
the possible common factor, we must first eliminate the variation in 
protein with time (due to seed selection), and then correlate the 
residuals. First, we will assume as an approximation that the regres- 
sions on time are linear, with correlation coefficients of -f o -86z 
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and 0*708, and using equation (11.3) (time is z and the protein 
contents are a; and y), find the partial correlation to be 

— 0*638 4 “ 0*862 X 0*708 

— , , = - r = ~ 0*o8. 

V (i — o * 862^) (i — o * 708^) 

which, on such a small sample, is not significant. We have seen, 
however, that the regression of the two protein contents on time 
departs considerably from linearity, and equation (i 1.3) is useful only 
as an approximation; but we may use the analyses of Tables 9.4 


TABLE 11.5 

Analysis of Co-variance of Protein Content 


Source of VariatioE 

Degrees of Freedom 

% 

Sums of Products : 

First order regression 

I 

— 33 ‘oi ; 

Second order regression 

. . I ■ 

~ 2*86 

Third order regression 

• » .1 

— 2“i8 ^ 

Residual 

zs 

+ 3-52 

Total 

, . 28 

— 34 - S 3 


and 1 1 .5 to correlate the residuals after fitting cubic equations. This 
correlation coefficient is 


+ 3-5^ 

V 12-34 X I2-o6 


4 “ 0*29? 


and although still too small to be significant on sudi a small sample, 
does suggest that some common factor varymg from year to yeM* 
may have been affecting the annual residual fluctuations of protein 

yield in the two series in the same way.* 

Such problems, in which it is desired to correlate two series of 
observations extended in time or space, have received a good deal cff 
attention from statisticians; the example in section 7.3 of the cmrrela- 

* We have treated the first observation in each of the two ser^ as inde^ 
pendent, but they are actually the same protein content of a coimmm 
This does not affect the constants much, nor, in tins instance, our am- 

clusions. 
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tion between age at death and the proportion of marriages contracted 
in the Church of England comes into this category. Essentially the 
problem is a special case of partial correlation, with time (or space) 
as the variable to be eliminated, and the discussion has centred 
rotind the development of suitable means of expressing the regres- 
sion and obtaining residuals; there is most difficulty in economic 
statistics where factors are apt to vary periodically (or in waves), 
but for most biological data extending over only a few years, the 
method we have outlined is adequate. Even if no such obvious effect 
as that in the example just worked is present, the possibility of some 
unsuspected time-effect in data collected over some period of time 
should always be considered. 


Partial Aim Multiple Regression 

11 . 5 . With every partial correlation coefficient there goes a partial 
regression coefficient^ which is obtained by dividing the sum of 
products by the appropriate sum of squares of the independent 
variable and which expresses the amount of change in one factor 
associated with. a second when the other factors are constant. The 
regression of grain on straw yield (Table ii.a) for the residual 
variations is the value of on p. 250, viz. 

58 549-0 


71 496 * I 


= 0-82 


unit of grain associated with a change of i unit of straw per plot. 

When there are several quantities all linearly related, partial 
regressions are best found by fitting a multiple regression equation. 
If y is the dependent variable and Xg, x^ . . , are the independent 
variables, the linear regression of y on the x^s is expressed by an 
equation of the form, 


Y' = ax[ + cx^-\~ , , . (ii*S) 


Equation (11.5) is called the multiple regression formula of 3; on x^, x^, 
etc., and the constants a, i, c, etc., are the partial regression 
coefficients, showing how much unit changes in the individual x 
variables affect 3;, independently and directly. The constants, a, i, c, 

* x' and 3;' are deviations from the grand means. 
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It 

etc., may be found by the method of least squares, and leads to 
a series of linear equations, 

Sy'x[ = aSx^ + bSx^x!^ + + . . . | 

S/x'^ = aSx[x'^ + bSx^^ + cSx'^'^ + . . . . 

Sy'x'^ = aSx[x^ + bSx^x^ + cSx^ + . . . ^ 


The summations have to be found, of course, from the data, and the 
equations have to be solved for a, i, c, etc. These are not unlike 
equations (9.4) given on p. 195 for determining the constants of a 
curved regression line ; and indeed the two processes are somewhat 
the same, for in the earlier one we are finding the multiple regre^ion 
of y on X, x^y x^y etc. 

We shall illustrate this by finding from the data of Table 11.6 
(Harris, 1931) a formula for predicting the loaf volume from the 
protein content of the flour and the percentage of it extracted by 
potassium bromide solution. The forty-four wheats were analped 
and subjected to the baking test to give the data of the table. If we 
call the loaf volume y^ and the protein percentages % and x^y 
sums of squares and products are as given in the following equations, 

I 462-33 = 107-55 a — 161-646 

— 2022-85 = — 161-64 + 473*19 

These, solved for a and 6, lead to the regression equation, 

haf volume in cx, = 14*738 5 X crude protein per cent, 

4 " 0*759 7 ^ protein extracted by KBr per cent, 

all quantities being measured as deviations from their mmis. 
If we were to use the crude protein percentage only, a would then 
the ordinary linear regression, and equal to 


1 46^*33 
107*55 


13-6 c.c., 


I 

for a change of i per cent, in protein. 


The Multiple Coreelation Coefficient 

11.6. Just as we need the correlation coeffident to give an idea of 
the accuracy of a linear regression formula with a sii^le term, so we 
need a constant to specify the accuracy of a multiple regresdrai 
formula, and this is provided by the mut^le correktim coeffident. 

B 

#1 
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This may be fouiKi by correlating the predicted values (T) given by 
the formula with the actual ones (j); or alternatively, the sum of 
squares of the deviations of y from the regression is (i — multi- 
plied by the sum of squares of the deviations of y from the grand 
mean, where R is the multiple correlation coeiEcient. Thus, multiple 
correlation is another case of the analysis of variance, analysing 

TABLE 1 1. 6 


>- 

i 

Loaf V^olume , 
(c.c.) . 1 

t i 

Crude 
Protein, 
Per cent. 

Protein 
Extracted by 
KBr, per cent.* 

Loaf V'olume 
(c.c.) 

Crude 
Protein , 
-Per cent. 

Protein 
Extracted by 
KBr, percent. 

540 

12*0 

31-2 

472 

12 ■ 6 

21*8 , 

475 ' 

11*5 

31*4 

500 

10*4 

31-6 

450 

I I -o 

31-3 

520 

13*8 

288 

455 

12*6 

27*9 

525 

136 

26*7 

46s ■ , 

11*9 

31 4 

490 

11*5 

30*8 

470 

120 

" 30*2 

505 

10 6 

31*5 

440 

11*7 

31*9 

475 

•10-7 

315 

490 

12 * I 

31*8 

545 . 

14*2 

28-8 

475 

12*9 

. 31-0 

534 

134 

27-4 

46s 

11*3 

3 i '7 

490 

11*0 

29*3 

520 

14*7 

28*3 

507 

10*8 

, 29-4 , 

525 

15*2 

20*5 

530 

12*8 

28*2 

■ 450 

10*5 

28*7 

505 

12*2 

29*7 

500 

13*1 

27-7 

495 

12*5 

289 

445 

10- 1 

31*8 

515 

12*4 

30*2 

443 

II -2 

28*9 

505 

14-7 

21 *4 

463 

11*3 

25 ‘5 

535 

14*5 

22*7 

446 

11*5 

30*0 

445 

9*0 

34*9 

) 500 

12*9 

28*5 

470 

11*2 

29-7 

; 487 

10*4 

25-5 

435 

91 

34*0 

' 502 

12*7 

25*5 

475 

10 '8 

29 • 2 

: 529 

16*0 

21 *-8 

505 

13*9 

26*0 , 


* This is the percentage of the crude protein extracted by KBr. 


that of y into two parts, one associated with the regression and the 
other a residual, and the degrees of freedom of the former are equal 
to the number of terms in the regression equation. We have entered 
these terms in Table 11.7, assuming the size of sample to be N and 
the number of terms [jc’s in equation (11.5)] to be m\ If there is 
only one term on the right-hand side of the regression equation, 
m' 5= I, and Table 11.7 reduces to the form for the correlation of 



TABLE 1 1.7 
Analysis of Variance 
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two variables (Table 7.6, p. 158), where R = r. On the other hand, 
if we compare Tables 11.7 and g.a (p. 193) we see that they are oi 
essentially the same form, corresponding to and mf to (m — i), 
N being the same ; thus, the multiple correlation coefficient and the 
correlation ratio are essentially analogous constants, both measuring 
association, the former between one variate and a number of others 
as expressed by a multiple linear regression formula, and the latter 
between one variate and another expressed as a series of array means 
or as a curved regression line. We have already pointed out the 
analogy between the multiple regression formula and the curved 
regression line, and now see that it is complete. Like the correlation 
ratio, the multiple correlation coefficient is essentially positive, and 
becomes larger as the number of degrees of freedom associated with 
it {m') increases; indeed, if there were as many constants as there 
are independent deviations of j in the sample (m' = iV — i), R 
would equal unity. Hence, all that we have written about the corre- 
lation ratio is true of the multiple correlation coefficient. 

In order to find if, it is not necessary to find the individual regres- 
sion values of 3^, but the following relation may be used 

Sy'2 = aSy'x[ + bSy'x'^ + + . . .* . (11.7) 

For our wheat quality and loaf volume data, 

SY'^ = B?Sy'^ = 14-738 5 X I 462-33 

— 0-759 7X2 022-85 = 20 015-8, 

« 

* Squaring equation (11.5), 

4 - cx'^ + . . .] 

+ hx'^ax'^ + hx2 + C5f3 4- . . .] 

+ cx^[ax[ 4- hx^ 4" cx^ 4" > • •] 

4“ 

= a[ax^ 4" hx^x^ 4- cx^x^ + - . 

4" b[ax[x2 4- 4- 4- - . 

4- + bx^c'^ + CK 3 ® + . . .] 

4“ . 

and summing, 

SY'^ == a[aSx'^ 4- bSx[x2 4- cSx[x2 + • • •] 

4“ b[aSx[x2 4" bSx^ 4- cSx^c^ 4- . . .] 

4“ c[a>53fiJW3 4- hSx^Q 4- cSx'^ 4- - - j 

Equation (i 1.7) follows from this when the values of equation (11.6) arc 
substituted for the terms in the brackets. 
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and since 

5 y 2 = 41 246*8, i?^ = o*485 3 and J? = 0*697, 

Correlating the loaf volume with the crude protein content only, 
we find r = + o • 694, and thus see that although it al^rhs one 
more degree of freedom, the multiple regression equation is ^rarcely 
better than that using protein content as a single factor. We deal 
with tests for the significance of such a difference in section txjjz. 

Errors of Sampling in Multiple and Partial Correlation 
Partial Correlation Coefficient 

11 . 71 . Fisher (1924c) has shown that the distributions of the partial 
and the total correlation coefficients are the same, when they are 
estimated from deviations having the same number of degrees of 
freedom. Thus, if there are N pairs of observations and m fectors 
have been eliminated (leaving N — m — i degrees of freedom), 
the partial correlation coefficient is distributed like that of the total 
coefficient based on {N — m) pairs of observations. 

The residual correlation coefficient between grain and straw yield 
in Table 1 1.2 is based on 49 degrees of freedom and may be treated 
like an ordinary correlation based on 50 pairs of observations, while 
the correlation for treatment variations is based on 7 d^ees of 
freedom, and may be regarded as arising from a sample of 8; we 
will use Fisher^s z' transformation (section 8.2) to test the signifi- 
cance of the difference between them. The values of z" are 0-681 
and — 0*347 with a difference of 1*028; the standard error of this is 

± Vs + A' = ± ^’47^ 

and the difference is significant. 

We found in section 114 that the partial correlation coefficient 
between the residuals of protein contents of two series of amx after 
the time trend had been eliminated by a cubic was + 0*29, ^d 
since there were 29 observations with four terms eliminated (in- 
cluding the mean) we must test this as though it were based cm 
26 pairs of observations. Using Fiber’s table of values of the corre- 
lation coefficient at different levels, we should enter it at n = 24; 
there is no value at this point, but we see that for n = 25? ^ 0 3^ 

lies on the 0 *1 level, and our observed value, being le^ even than 

this, is not significant. 
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For the same reasons as advanced in section 10. i in connection 
with the mean, the sampling error of the correlation coefficient is 
only given by the usual formulae when the correlation is between 
homogeneous and independent deviations arrived at after a complete 
analysis of co-variance. 


Multiple Carrelaticm Coefficient 

11 . 72 . In order to test the significance of a multiple correlation 
coefficient we may use the analogy with non-linear regression and 
employ the same test. From the analysis of variance of Table 11,7 
we find 

jmN - - i) 

^ 

and enter Fisher’s tables at — m! and /Zg = (iV — wi' — i). 
Wishart (1928) gives a table of values of the multiple correlation 
coefficient lying on the 0*05 and o-oi levels of significance. 

For the data of Table 11.6 (loaf volumes and wheat qualities), 
= 0*485, == a, iV = 44, 


•2^ = J log. 


0-485 

0*515 



1*48, 


and this is even greater than the value of z lying on the i per cent, 
level when = 2 and = 30 (i.e. greater than 0*84). Hence this 
multiple correlation coefficient is significantly greater than zero. 

We may also wish to test if a regression formula with a large 
number of constants gives a significantly closer prediction than one 
using fewer, and this may be done in exactly the same way as in 
section 9.2, where we tested for non-linearity of regression. If there 
are two regression formulae involving in[ and constants, being 
greater than and the multiple correlation coefficients are and 
i?2> the complete analysis of variance is as shown in Table 11.71, 
and since all the sums of squares are positive, must always be 
greater than R^. If, however, the difference between R^ and i?2 is 
the result of a real effect, the variance for the difference between 
regressions will be significantly greater than the residual, and using 
Fisher’s test, we see if 




-i?|) {N 


m. 


i) 


i^) (wi; - m'^ 
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is significant for degrees of freedom, 


— and N — — i. 

Applying this to the data of Table 11.6, r® for protein content 
alone = 0*482 (m' = i), while for the multiple regression, 
= 0-485 (?w' = 2); hence 


2 = log 



0-003 X 41 

0-515 


— o-yidj 


and certainly is not significant. Thus, the extraction of some of the 
protein by potassium bromide has added nothing to the informatioii 
provided by the crude protein eoiitent alone, as far as the prediction 
of loaf volume for this series of flours is concerned. 


11 . 73 . We will illustrate the further application of these tests by 
another example, which presents several features; the data are in 
Table 11.8. Fifty groups of varying numbers of cotton hairs were 
weighed and measured, and the weights of the groups were expres^d 
as multiples of io""®grammes per centimetre.Thesewere then swollen 
in caustic soda and placed in three classes. A, B and C, according to 
their appearance imder the microscope. The problem is to determine 
if the three classes show real differences in average hair-weight per 
centimetre. A regression formula of the form of equaticm (ii-S) 
may be obtained to express the weight of a group of hairs in t&nm 
of the nximber in each of the three classes; if we puty equal to the 
group weight, and X2 and the numbers in the three cla®^, 
a, h and c are the corresponding mean hair-weights per centimetre. 
Similarly a second regression may be obtained in which only the 
average hair-weight^ for all classes, and the total number of bans 
in the group are used ; this is the ordinary case with one omstaat. 
Now the former regression, absorbing three degrees of freedom, 
will necessarily have associated with it a little more of the variance 
in weights of the groups than the second, but if the three grou|^ 
really have different-hair weights, the equation with three constants 
will fit the data very much better than that with only cme, and the 
difference in the associated variances will be sigmficmtly ^eater 
than zero. We may therefore investigate the problem by finding the 
regressions and variances, and testing the significance of the differ- 
ence in the latter. 



TABLE 11 , 8 .*“— Numbers of Cotton Hairs and Weights (in io“® grammes) 
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• &imc of th€ groups have fracttonal numbers because one or two hairs were lost between weighing and classifying, and 
these were assumed to be distributed proportionately in all classes. The weights were supplied as mean weights per centi- 
metre of hair to the nearest unit, and those in this table are. the means multiplied by the number of hairs in the group. 
The effect of these approximations in the data is exceedingly small. 



MULTIPLE AND PARTIAL CX>RRELATION 

To find a, b and c we solve equations (ii-6), and Iot tiie» we 
need to determine the sums of ^uares and products from the data. 
It is not fair, however, to give all groups, whether large m anal, 
the same weight in finding these sums. If the numtcr of hairs 
in a group is w, and the variability of weights of single hairs is 
approximately the same in all classes, with a standard deviation erf 
or , the standard error of the mean weight per hair for the gnmp is 

a/V Hy and that of the total weight is n times this, md so is pre^KM"- 

tional to \/ n; hence the variance of the total weight of a group due 
to variations between hairs within the same class is propcrdcnal to 
w, the number in the group. We may use the results of secticii 3.7 
and regard i/n as the quantity of information; then in making 
summations, each term may be weighted with this quantity, multi- 
plying each product and square by ijn before summing. For example, 
the weighted sum, 



^ 3243 X 15-4 I 

\ 23 


S 580 X 25-0 

r 

3 ^> 



A further modification is necessary ; a group with no hairs naturally 
has zero weight, so we shall make our regression line pass through 
the origin and not through the mean, and shall measure all deviations 
from zero; that is, in equations (11-5) and (11.6) we write x and y 
instead of x' and y\ It may assist some to think of the data erf 
Table 11.8 as a h^ of the complete data, the other half having 
equal negative values, so that the mean is zero, and the sums of 
squares and products we obtain are half the total fear tiie hnaginarj 
complete results. 

The appropriate sums of squares and products are set out in equa- 
tions below, and their solution yields the r^^ion equatiem following 
them. 

363 785-7 = I 468 - 85 ^ + ^^ 8-836 + i 8 - 4 zr 
73472 - 3 = z 88 - 83 a+ 74 * 73 ^ + 4 ‘ 94 ^ 

4646*0= 18-42^ ”1" 4’94^ 1*04^ 

7 = 226 - 2557 %+ 114 * 127 % — 82 - 16 % 

Although the data are to some extent approximate, the arithmetic 
must be performed with considerable precision if tiie fin al results 
are to have any accuracy at all. In findi ng Sx^fti we calculated 
XjJn to five decimal places, and then summed yx^xju on a machin e ; 
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as a check, we then calculated y/n and summed Xy/n. In performing 
the divisions for the solution of the simultaneous equations we had 
to work to nine or ten significant figures to obtain the accuracy of 
the constants shown in the regression. It is not claimed that the four 
decimal places for a have any physical meaning, but as a constant 
of the sample which will be used to find the sum of squares of group 
weights, that accuracy is necessary. Closes A and B have differing 
mean hair-weights per centimetre, while the hairs of C appear to 
have a negative one; however, there were very few of class C, and 
this irrational result is due to sampling variations, and signifies 
nothing. To find the sum of squares associated with the regression 
we use equation (n.y), and find 

226-2557 X 363785-7+ 114*127 X 73472*3 — 82-16 

X 4646*0 == 90 312 000. 

This could not have been given correct even to five figures if we 
had determined the regression coefficients correct only (say) to 
three. The weighted sum of squares of the hair-weights is 


3243" 

23 



+ . . . = 91 220 148, 


and we will use this to five significant figures only. 

We now have to find a regression equation using one constant, 
the mean hair-weight per centimetre for all classes. Using only the 
first term and equation of (11.6), and finding the weijghted sums of 
products and squares, we have 



since there is only one class, x^ = n, and we obtain Sy = a'Sn^ 
which is the straightforward relationship, rnean hcdr-weight = total 
weight of all groups divided by the total number of hairs. This gives 
d = 203*736, and the sum of squares of weights associated with 
it is 203*736 X 441 904= 90032000 (correct to five significant 
figures). 

The analysis of variance is in Table 11.9, and in reckoning the 
degrees of freedom it must be remembered that deviations are not 
measured from a mean determined from the data, but from the 
origin, so that the fifty groups contribute fifty degrees to the sum of 
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squares. In order to test the significance of the difference in 
sions, we find 



and this is significant, for when % = 2 and == 30, z = 0-84 
lies on the i per cent, level. Thus we conclude that the two da^es of 
hairs A and B, as determined by swelling in caustic soda, have sig- 
nificantly different mean hair-weights per centimetre. This r^ult 
could not have been obtained directly, because there were no means 
of weighing the hairs individually, and it was not practicable to 
obtain the individual classes for separate weighing. 


TABLE 11.9 


Analysis of Vabiance of Weights of Grouts of Cotton HAm 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Vaiiance > 

Multiple regression — 
Simple regression 
Difference in re- 
gression 

Residual 

90 032 000 1 

>90 312 000 
280 000 j 

90 8 000 

:!■ 

47 

- 90 032 000 ; 

^ i 

140 000 
19 3 00 > 

Total 

91 220 000 

li 50 

i '1 

1 ' 

” t V 

1 D 


11 . 8 . We must emphasise the underlying assumption of Ihe n^thods 
of this chapter that the residuals are homogeneous in the seme that 
th H T- variance and co-variance are constant. In the example ca 
section ii.i, for instance, we assume the relati(m betvreai readirf 
straw and grain yields is sensibly the same for aU treatmmts ^ 
blocks, and in the exanq)le of section 11.5 illustrating mntople 
repression we are assuming that the relation between loaf yduim 
and protein extracted is the same for all values of crude jmJtein. 
tests of significance also assume the residual deviaticms to be ncanmly 

distributed. 
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APPENDIX 


Proof of the Formula for the Partial Correlation 

Coefficient 

Let X and y be two variables, and let z be the third which is to be 
eliminated. Then the regressions of » and y on ar (measuring all as 
deviations from the mean) are 


and 



and the partial correlation coefficient between x and y is 

Six' - X')iy' - F) 

* VSix' - X'fSiy' - Y'f ■ 


(11.9) 


Expanding the numerator of (11.9) 


Six' - X')iy' - F) = Sx’y' + SX'Y' - Sx’Y' - SX'y', 


and substituting from (11.8) for X' and F', 


- X')iy' - F) 

= s^y + v,.Vs^> - W . 

= Sx'y' — Vy.V Sx^^Sy'^, 

= VSx’^Sy'^r^ - V,,) (11.91) 

Expanding the first term in the denominator of (ii*9)) 

5(*' - Xf = Sx'^ + SX'^ - zSx'X’, 
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level of significance, 70 

and probability, distinctiwi, 

76, 136 
likelihood, 95 
loaf volume, 192, 257 

Mackenzie, 123 
Mahalanobis, 118, 120 
maize seedHngs, 113 
marriages in churches, 162 
mathematical probabffity, 44 
Matthews, 23 
maximum likelihood, 95 
McLane, 114 
mean, 28 

— , differences espre^^ m varknce, 
128 

— sampling distribution, 63 
— , significance, 70, 71 
— , — , parallel pairs, 23 1 
— , — small samples, 112, 114 
— f standard error, 62, 63 
— standard error in complex pecu- 
lation, 204 
mean deviation, 30 

standard error, 85 

mean square contingency, 20a 
median, 29 
Mendel, 103 
mice, 83 

Miner, 251 ^ 

mode, 26, 29 

moments, computation, 34 
— , deduction from (xmtmww fre- 
quency curve, 55 
— , second, 30 
— , third and fourth, 33 
mortar, iSa 
Morton, 141 

multiple factor experiirwnt, 235 
Mumford, 251 

Nayer, 120 
Neyman, 74, 90, 120 
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non-nonnal freqiiency curves, 61 
nonnal distribution, 54, 270 
determination of frequencies, 

56 

moments, 55 

, practical applicability, 6o 

, relation between frequencies 

and standard deviation, 59 
nonnal surface, 149 
normality, tests for, 86 

occurrence, 47 
odds, see probability 
ogive, 20 
Orensteen, 180 
orthogonal equations, 197 
orthogonality, 241 
ovules per ovary, 126, 191 

pairs, variance from, in 
parameters, 92 
parents and children, 161 
Parkes, 83 

peakiness, see kurtosis 
Pearl, 22 
Pearse, 23 

Pearson, E. S., 74, 79, 85, 86, 120, 
138 

Pearson, K., 18, 31, 33, 39, 57, 86, 
98, 100, 106, 176, 190 
peas, 103, 121 
Peter’s method, 30 
physical determinations, random 
errors, 69 
pis^, 145, 177 
Poisson distribution, 48 
— — small samples, 123 

, standard error of mean, 84 

, tables, 49 

, testing randomness, 51 

polynomial equation, 195 

j system of fitting, 197 

pirpulation, 16 

— , determination from sample, 89 
— infinite, 45, 62, 67 
Prentice, 123, 207 
probability, 43 
— 3 test of laws, 49 


probability integral, 56 
probable error, 71 
product moment, 165 

, computation, 166 

proportionate frequencies, 28 
protein content, of flour, 257 

, of wheat, 174, 193, 197, 254 

Przyborowski, 54 

qualitative and quantitative variate, 
16 

quantity of information, 94, 178, 
265 

quartile deviation, 30 
, relation to standard devia- 
tion, 60 

rabbits, 115, 119 
rainfall and sunshine, 161 
random groups or blocks, 230 
random samples, 45, 67 

, complex theory, 204 

f technique, 67 

random sampling numbers, 68 
randomness, tests of, 49 
range, 31 

— , standard error, 85 
ratio between two means, standard 
error, 88 
reaction time, 148 
recruits, 20 
regression, 149 
— , as variance, 193, 200 
— , line through origin, 265 
— y linear, 149 
— multiple, 256 
— non-linear, 189 
— partial, 256 

— y tests for linearity, 192, 200 
regression coefficients, 155 

computation, 165 

y estimation, 164 

, significances, 181 

, standard orror, 182 

Reid, 182 

representative samples, 67 
residuals, 130 
— , correlation of, 244, 253 
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routine analysis, determinatioii of 
errors, 206 

sample, 16 
— , random, 45, 67 
— , small, no 
sampling, by strata, 209 
— , unrestricted random mediod, , 
209 

sampling distribution, 62 
, skew, 80 

, of various constants, see tmder 

the constants 

sampling from limited field, 209 
sampling technique, 68 
sampling theory, complex popula- 
tions, 205 

y small samples, 1 10 

Sanders, 240, 250 
scatter, see dispersion 
scatter diagram, 140 
Scotsmen, heights, 73, 84, 118 
semi-interquartile distance, 31 
set, 47 

sex-ratio of rats, 83, 104 
shape of distribution, 33 
Sheppard, 57 

Sheppard’s corrections, 39, 13 1 

, effect on tests for correlation, 

188 

significance, statistical, 71 
— , from skew distribution, 80 
— , tests of, 69 

— , see also tmder various constants 
skew sampling distributions, tests oJ 
significance, 80 
skewness, 21, 34 
— , test for, 86 
small numbers, law of, 49 
smallpox, 106, 203 
small samples, no 
Smith, 123, 207 
smoothing, 201 
Snow, 161, 177 
Soper, 49 

spectacle glass, 218 
spiral reversal, 20 
spurious correlation, 254 


aSs 

stamens, 145, 177 
standard deviation, 30 

^ estimated fctMn two 

73 

relation to nomm! f 

59 

, standard error, 84 

standard error, 62 

^ complex sample, 204 

, of various constants, see tmder 

the constmsts 
statistical method, 15 
statistical probability, 44 
statistical significance, see significsrace 
statistics, 92 
strata, 209 
Student, 52, 113 
sub-range, 19 
summation, rules, 37 
sunshine and rainfall, 161 
survivor curve, 20 

t test, 1 12, 271 
— , essential character, 114 
— relation to z test, 134 
temperatures, maximum and mini- 
mum, 147, 247 
termites, 137, 211 
tern eggs, 142, 157 
Thornton, 123 
time series, 197, 201 

, correlation of, 254 

Tippett, 54, 68, 120 
Tower, 23 

transformed variate, 36 
. trial, 47 
triplet births, 177 
Tschepourkowsky, 178 
Turner, 70 

universe, 16 

vaccination, 106, 203 
variability, 30 
variable, 16 

— depmdent and indep^^ient, 154 
variance, 30 
— , additive nature, 125 

i * 
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variance, analysis, 125 
— , analysis, computation, 130, 134, 
217, 232 

— , analysis, effect of skewness, 138 
— , analysis into many parts, 213 
— , associated with regression, 156, 
192, 200, 259 
— , computation, 34 
— , estimation from small samples, 
no 

— ^ sampling errors in small samples, 
1 17, 120 

— , standard error, 88 
variate, 16 
vice-coxmties, 23 
vital capacity, 251 
vitanoin B, 83, 105 

waltzing mice, 83 
Warren, 137 

weaving of cotton warps, 235 


weighting, see quantity of informa- 
tion 

Weldon, 145, 161 
Wheldale, 102 
Wilenski, 54 
WiUis, 26 
Winter, 175 
Wishart, 240, 250, 262 
working mean, 36 

Yates, 18, 234, 241 
yeast cells, 51, 84 
Young, 251 
Yule, 16 1, 162 

ar, interpolation in tables of, 119 
artest, 117, 272 
— , relation to t test, 134 
z , transformation for correlatior 
coefficient, 176 
Zinn, 192 
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